Message boards :
ATLAS application :
Very long tasks in the queue
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 17 Sep 04 Posts: 99 Credit: 30,642,553 RAC: 1,979 |
All my Atlas tasks validate on the Linux box even if not having the HITS file. All my Atlas tasks are invalidated on the Windows 10 PC despite it having a more modern AMD CPU and three times its RAM. I find I need to run Atlas by itself. It did not do well when I was also running Einstein@home. Regards, Bob P. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 5,837 |
over night 8 Longrunners have been finished and succesfull validated So far my best: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170665 6,665.59 credits *smile* Supporting BOINC, a great concept ! |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 5,837 |
|
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,816,631 RAC: 127,244 |
Runtime 1 day and 15 hours: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=60711144 The upload of 585,46 MByte is waiting... |
Send message Joined: 14 Jan 10 Posts: 1273 Credit: 8,480,147 RAC: 2,155 |
Longrunner returned: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170731 Run time 1 days 9 hours 8 min 10 sec CPU time 5 days 7 hours 30 min 21 sec Validate state Valid Credit 632.96 ................. Not very much for over 5 days of CPU In the stderr output, I don't find any ATLAS-job information, except "Starting ATLAS job. (PandaID=3283615871 taskID=10959636)" |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,991,649 RAC: 239,589 |
I had a few: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126171019 https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170863 https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170832 They didn't seem to run for so long and xx19 had some errors Seems like some of the normal ones had ~1000 credit today?? |
Send message Joined: 14 Jan 10 Posts: 1273 Credit: 8,480,147 RAC: 2,155 |
I had a few: You have set up dual-cores without app_config. They get 3600MB of RAM and 4400MB is required for multi-cores. So use an app_config.xml with the minimum RAM or setup at least three cores in your preferences and a VM with 4600MB will be created. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 5,837 |
Longrunner returned: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170731 Is it this machine? https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10360630 Then, yes, you are right, this is definitly low, but your normal Atlas-WUs get also very low credits. Don't know what is the reason ... Could it be that the last benchmark went wrong ? Supporting BOINC, a great concept ! |
Send message Joined: 14 Jan 10 Posts: 1273 Credit: 8,480,147 RAC: 2,155 |
Then, yes, you are right, this is definitly low, but your normal Atlas-WUs get also very low credits. Don't know what is the reason ... The benchmark is at least 1 year and maybe even 2 years old. Do not have that low credit 'problem' with the Theory-tasks, so it's BOINC's new credit algorithm combined with new applications. Maybe the benchmark was a bit low, cause I had fixed it to that level, because World Community Grid has sometimes a major problem with exceeded time limit when they switch tasks from the same application from long runners to very short runners vice versa. Meanwhile I raised the floating and integer speed. |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,991,649 RAC: 239,589 |
I thought I just picked out the single core ones as I changed back to singles as the combination of multi core and multi tasks isn't respected correctly and app config cannot be set correctly either. I'll see if I get some more long running ones in single core. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,464,258 RAC: 5,837 |
|
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
So far, I have had 5 long-runners, both 2-core and 4-core. All went through OK with app_config.xml setting memory at 4400 MB. Here is a 2-core: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170739 Here is a 4-core: https://lhcathome.cern.ch/lhcathome/result.php?resultid=126170657 Credit allocation, as often, is difficult to understand. The 2-core has more than double the credit of the 4-core, although the CPU time for the 4-core is more than double the CPU time of the 2-core... We are the product of random evolution. |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 103,037,844 RAC: 126,619 |
It would definitely be a good idea to create a seperate Task category for the longrunners. On one of my PCs, I am running 3 tasks 3 cores each on a "high-end" processor, so no problem if longrunners are downloaded and processed. On another two PCs, I run tasks with one core only, the processors are older, slower ones. So here it does not make any sense at all to have longrunners processed, crunching time would be 6-8 days. Hence, it would be nice if for each of my PCs I could determine in advance whether or not longrunners are being downloaded. I strongly guess that same is true for other crunchers as well. |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,991,649 RAC: 239,589 |
At GPUGRID they have long a short tasks. On Rosetta there is an option for target run time. I agree with Erich BOINC gives ETA of 10d on my E5-2675v3 |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 103,037,844 RAC: 126,619 |
on my PC with 3 ATLAS tasks 3 cores ea., all 3 tasks are long-runners now. As it seems, it well take each them about 56 hours to get finished. So I'll see what will happen. |
Send message Joined: 2 May 07 Posts: 2090 Credit: 158,816,631 RAC: 127,244 |
Runtime 1 day and 15 hours: Thank you ATLAS-Team - 2.600 Cobblestones. Upload finished successful. |
Send message Joined: 14 Jan 10 Posts: 1273 Credit: 8,480,147 RAC: 2,155 |
I got a resend of a long runner that failed due to EXIT_DISK_LIMIT_EXCEEDED. Peak disk usage reported 5,960.42 MB. https://lhcathome.cern.ch/lhcathome/result.php?resultid=126683384 My resend task started as a single core VM and I'll try to restart it with 4 cores. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
My resend task started as a single core VM and I'll try to restart it with 4 cores. How do you do that? I mean restarting a single core task with a different number of cores? We are the product of random evolution. |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,991,649 RAC: 239,589 |
Looks like my Xeon E5-2675v3 won't make the deadline for the long runners, ETA is 8days. e.g. https://lhcathome.cern.ch/lhcathome/result.php?resultid=127152468 |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,991,649 RAC: 239,589 |
I have 4 that failed with EXIT_DISK_LIMIT_EXCEEDED 5-7GB It's a bit irritating though that in the log it shows HITS file and: 2017-03-20 15:56:46 (2872): Guest Log: Successfully finished the ATLAS job! 2017-03-20 15:56:46 (2872): Guest Log: Copying the results back to the shared directory! 2017-03-20 15:56:46 (2872): Guest Log: Copied the result file back to the shared directory and created atlas_done file! 2017-03-20 15:56:46 (2872): Guest Log: Success! Shutting down the machine. 2017-03-20 15:56:46 (2872): VM Completion File Detected. I lost 900,000sec of compute and all three computers have over 100GB of free disk space and BOINC is set to use 150GB and is using ca 25GB so should have plenty of space |
©2024 CERN