Message boards :
Theory Application :
Tasks run 4 days and finish with error
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,287,420 RAC: 20,297 |
Hello. This tasks ran 4 days and finished with error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=263272995 https://lhcathome.cern.ch/lhcathome/result.php?resultid=263269392 https://lhcathome.cern.ch/lhcathome/result.php?resultid=263269875 https://lhcathome.cern.ch/lhcathome/result.php?resultid=263268089 Is it normal or I should stop running Theory for some time? |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
It's normal that a task is killed after 100 hours elapsed time to avoid endless running. The first 3 mentioned tasks belong to the list I mentioned here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4979&postid=41650 The last task sometimes succeeds, but probably needed more time :( |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 51,502,772 RAC: 22,330 |
Ive also got a bunch of these. Should I just let them run til they fail or should I abort any task with a estimated time of 4 days? |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Should I just let them run til they fail or should I abort any task with a estimated time of 4 days?BOINC don't know how long the tasks will run. The 100 hours is just a placeholder to show something, but in fact useless. Whether a job has real progress, you could show when highlighting a task in BOINC Manager and tick Show Graphics on the left. You need VirtualBox Extension Pack installed for that. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 51,502,772 RAC: 22,330 |
Looks like its stuck in some sort of loop. It just keeps printing this output. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
It looks like it will soon be killed due to the time limit of 100 hours. |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,287,420 RAC: 20,297 |
It looks like it will soon be killed due to the time limit of 100 hours. If tasks need more time to succeed, may be we need time limit of, for example, 200 hours? It is very sad to waste 4 days of computing. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
And 4 days would be a waste if it did not succeed in that time. It is a game of users patience, when would we reach our the threshold of keep them running. It could be extended to never ending but users would not accept it. If run a in native application i normally set 7 days no matter what the stage it would say. This would be a fictive number to not deal with these kind of jobs it choose to run in. Running a script to abort known job that are doomed to fail is one way or add a blacklist is another way to deal with it. Most user would probably abort on specific time reached and if it got get to common they would uncheck application. My view is that it would be better if these would fall into Theory Beta or deal with them on separated project as LHC dev. They have a purpose for project but give out bad experience to whole Theory project. Sherpa in known mostly to this but while range of these type of work have show up as endless jobs. Some users would be open to opt-in to these jobs but able to choose on when and with which hardware. Most users would not monitor each task or host daily or weekly. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 51,502,772 RAC: 22,330 |
Yeah it kinda feels like I'm wasting my time since about a 3rd of my tasks run for 4 days then fail before completion. |
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,501,728 RAC: 4,157 |
Hello. It is almost always because of the Theory task being a *sherpa* and you can tell right at the start if you got a sherpa by checking the VM Console when it starts running and can then just abort it and try again for a *pythia* task or the different versions of *herwig* Theory tasks. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
It's not known in advance how long a Theory job will run. If you don't have the time or don't want to babysit the jobs and worrying about 100 hours waste you could consider to reduce the 100 hours max. run time. No guarantee that this is less waste. It's up to you. Make up your mind/math. 88.5% of the jobs are ready within 5 hours (18000 seconds) 96.3% of the jobs are ready within 10 hours (36000 seconds) Disadvantage: you would kill some jobs normally successful between 5/10 hours and 100 hours (extra waste of time) Advantage: The +100 and endless running ones would be killed much earlier so less waste of time. In the projects directory there is a file called: Theory_2019_11_13a.xml Change the value in <job_duration>360000</job_duration> to your needs. BOINC is checking normally the file size, so when reducing with 1 digit, you have to add a digit somewhere else (space in front of the line e.g.) |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,287,420 RAC: 20,297 |
It is almost always because of the Theory task being a *sherpa* and you can tell right at the start if you got a sherpa by checking the VM Console This parameter? https://yadi.sk/i/hJZadj_mOzkGXA |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
This parameter?Yes. Almost all of the erroneous tasks heading to run endless are with the Sherpa generator. |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,287,420 RAC: 20,297 |
Yes. Almost all of the erroneous tasks heading to run endless are with the Sherpa generator. Thank you! |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 51,502,772 RAC: 22,330 |
All of my long runners are sherpa as well. |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,938,886 RAC: 137,525 |
Not all sherpas are bad by default. Instead, some of them have an excellent success ratio: run events attempts success failure lost pp jets 200 - - sherpa 2.2.8 default 2100000 25 21 1 3 pp jets 7000 - - sherpa 2.2.8 default 2157000 25 22 0 3 It may be a good idea to check the mcplots pages before cancelling a task: http://mcplots-dev.cern.ch/production.php?view=runs&rev=2363&display=all Based on the example tasks from your post you may filter the list using "ee zhad 206 - - sherpa 1.3.1 default". Then decide whether you accept the success ratio or not to give the running task a chance. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Not all sherpas are bad by default.Most of them have a successful result. From the 2141 known parameter combinations with sherpa as generator 1634 have at least 1 success. So 507 have not a valid result so far. Comparing the figures from the last time (maybe 10 days) 2 sherpa's turned from no success to at least 1 success: pp jets 7000 40,-,810 - sherpa 2.2.1 default ppbar jets 1960 17 - sherpa 2.2.5 default |
Send message Joined: 18 Nov 17 Posts: 119 Credit: 51,287,420 RAC: 20,297 |
It's not known in advance how long a Theory job will run. I would like to increase run time limit, not reduce. In order to let normally running tasks to succeed. Do we know how many hours (maximum) normally running task may need? |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
I would like to increase run time limit, not reduce. In order to let normally running tasks to succeed.No, we don't know, but when you want to monitor the success possibility during run time, you could leave out the line with job_duration in the before mentioned xml-file. Also add a line in the options part of cc_config.xml: <dont_check_file_sizes>1</dont_check_file_sizes> Endless tasks where you yourself have to decide: give it a chance or abort. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 51,502,772 RAC: 22,330 |
I dont think thats going to help it looks like it will take 6000+ days for these tasks to finish. All of my long running jobs are still sherpas. |
©2024 CERN