Message boards : Theory Application : Theory Task having crazy time
Message board moderation

To post messages, you must log in.

AuthorMessage
TankbusterGames

Send message
Joined: 26 Apr 21
Posts: 2
Credit: 784,747
RAC: 0
Message 44851 - Posted: 1 May 2021, 18:49:33 UTC

So I had this weird Theory task showing up. Having 9 days remaining compute time. Decided to look into the task settings to see what the hell of a task that was.


Actually the job just finished after 2 hours and 8 minutes of runtime...
Still pretty strange..... The estimated time is most of the time a little off with all tasks, but 9 days ...
ID: 44851 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 46185 - Posted: 7 Feb 2022, 2:55:02 UTC

Short runtime and exit code 1:
2022-02-07 03:36:34 (3824): Guest Log: 03:36:32 CET +01:00 2022-02-07: cranky: [INFO] ===> [runRivet] Mon Feb 7 02:36:31 UTC 2022
[boinc pp jets 8000 25 - herwig7 7.2.0 softTune 100000 225]
2022-02-07 03:38:44 (3824): Guest Log: job: run exitcode=1
2022-02-07 03:38:44 (3824): Guest Log: job: diskusage=5148
2022-02-07 03:38:44 (3824): Guest Log: job: logsize=12 k
2022-02-07 03:38:44 (3824): Guest Log: job: times=
2022-02-07 03:38:44 (3824): Guest Log: 0m0.007s 0m0.022s
2022-02-07 03:38:44 (3824): Guest Log: 0m11.909s 0m3.401s
2022-02-07 03:38:44 (3824): Guest Log: job: cpuusage=15
2022-02-07 03:38:44 (3824): Guest Log: 03:38:43 CET +01:00 2022-02-07: cranky: [INFO] Container 'runc' finished with status code 1.
2022-02-07 03:38:44 (3824): Guest Log: 03:38:43 CET +01:00 2022-02-07: cranky: [INFO] Preparing output.
2022-02-07 03:38:44 (3824): Guest Log: [INFO] Job Finished
2022-02-07 03:38:44 (3824): Guest Log: [INFO] Shutting Down.
2022-02-07 03:38:44 (3824): VM Completion File Detected.
2022-02-07 03:38:44 (3824): VM Completion Message: Job Finished
ID: 46185 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 46186 - Posted: 7 Feb 2022, 12:23:23 UTC - in response to Message 46185.  

exit code 0 = success
ID: 46186 · Report as offensive     Reply Quote
Peter Skands

Send message
Joined: 31 Jan 11
Posts: 12
Credit: 3,557,813
RAC: 0
Message 46190 - Posted: 8 Feb 2022, 9:28:11 UTC - in response to Message 46185.  

Hi maeax and Crystal,

I posted a message on a similar thread to Erich56 just now, but repeat it here. I agree it's frustrating and I don't actually understand what is happening with these runs. In the past, we had argued that, for some generators, we had to accept a small failure rate since we otherwise could not do comparisons to those generators at all. We had then hoped that updating them to the latest versions would gradually fix the issues we were seeing, but this has not really been the case. Having to operate with a non-negligible rate of jobs that fail is not nice, especially when this fraction does not seem to reduce with time.

I regret if we have been too slow to react, but at least now for 2022, we have come up with a plan to revitalize T4T. To start with, we are going to stop sending out jobs for the generators that are problematic, at least until we can sit down for a good proper debugging session with the authors of those codes, and fully iron their issues out so that they would be ready and steady for sending back out in T4T again.

During 2022, we plan to start by focusing our attention on getting (back) to the virtual equivalent of what the LHC machine people would call 'stable beams' for the most widely used generator, Pythia, setting a new baseline for future T4T operation. At least for that generator, our team has author-level in-house expertise, so we are confident we can do this, if we put in the hours.

At the same time, we think this can allow us to try out some new and possibly even more useful tests, which I hope we will be able to also make some announcements of down the track. So despite the issues you and others have been experiencing, I hope you will choose to stick with our project a little longer and see if things improve during 2022.

Best regards
Peter Skands
ID: 46190 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 46228 - Posted: 11 Feb 2022, 11:05:02 UTC - in response to Message 46185.  

This is a new Herwig7 with exitcode=0 and exitcode=1 in one task:

Theory_2390-1104641-235_0
Arbeitspaket 182509350

2022-02-11 11:50:07 (3432): Guest Log: job: unpack exitcode=0
2022-02-11 11:50:07 (3432): Guest Log: 11:50:06 CET +01:00 2022-02-11: cranky: [INFO] ===> [runRivet] Fri Feb 11 10:50:05 UTC 2022 [boinc pp jets 13000 170,-,2960 - herwig7 7.2.0 softTune 100000 235]
2022-02-11 11:52:25 (3432): Guest Log: job: run exitcode=1
ID: 46228 · Report as offensive     Reply Quote

Message boards : Theory Application : Theory Task having crazy time


©2024 CERN