Message boards :
ATLAS application :
never ending tasks here
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 68,497 |
I still get some never ending tasks, can the task be modified to fail? ideally fast |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
I have received 3 tasks here: https://lhcathome.cern.ch/lhcathome/results.php?userid=444608&offset=0&show_names=0&state=0&appid=14 My setting at max 2 CPUs was correctly taken into account, and the VM size set by the server was 3000 Mbytes. But this task is never ending: https://lhcathome.cern.ch/lhcathome/result.php?resultid=121226104 Should I abort it? We are the product of random evolution. |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 68,497 |
Once they go wrong they never end so yes, you should abort. Hopefully they can continue to under stand this, as other project don't have this. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
Yeap, I have aborted the task when I saw that it used only 6 hours of CPU but the task did not end after more than 11 hours running. |
Send message Joined: 14 Nov 07 Posts: 3 Credit: 472,959 RAC: 0 |
I just started here with a Mac. My first task appears to be never ending. Over at Atlas@home I am getting good results. I noticed that the Application here at LHC@home has a different title than Atlas and it seemed to download a Vbox even though I have one already.. |
Send message Joined: 28 Feb 15 Posts: 6 Credit: 1,261,955 RAC: 0 |
i got first 2 units here on LHC. 1 crashes after 34 minutes 2 runs more than 13 hours up to 100% and now never ends.. CPU usage is 1% now. Both units run with 8 CPU cores... CPU usage is most of the time < 25% so its not very efficient to spend 8 cores to atlas :) 8 cores: Run time 13 hours 56 min 18 sec CPU time 23 hours 38 min 20 sec |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
Task : 121561738 / 58403859 over 2 days (100.00 %) |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,268,029 RAC: 6,930 |
Task : 121561738 / 58403859 over 2 days (100.00 %) This task may be dead. You could work through this checklist, but note, it has been written with SingleCoreWUs, so some MultiCoreSpecificDetails may not be in the list. For short, you could post CPU-Time versus RUN-Time (Mark the WU in your BOINC-Client and then click Properties) Supporting BOINC, a great concept ! |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
Perhaps I canceled Task 121561738 (58403859) too early ??!! Another Task : 121561706 (58403831); Runtime: 2d19h15min; CPU: 5d16h8min -> OK Checklist . . . (VBox with several projekts running for a long time) . . . |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
Perhaps I canceled Task 121561738 (58403859) too early ??!! My last single core ATLAS-task had a run time of 12.5 hours. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,268,029 RAC: 6,930 |
Perhaps I canceled Task 121561738 (58403859) too early ??!! As long as Runtime and CPU-Time are so proportional, I would let it run Are you running 2-Core-WUs ? Supporting BOINC, a great concept ! |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
I try to run 8-core-tasks, but most time 2 cores are used per task and some time 4 - 5 (?). But my other problem ist, that I can run only 2 tasks at the same time - would like to run 4 to 5 4-core-tasks (so as under the old ATLAS-project). |
Send message Joined: 1 Nov 05 Posts: 8 Credit: 597,196 RAC: 0 |
Also had (when using computer ) some tasks running longer than expected, . Will happen around 80% completion, processor load nil. When either suspending job, or opening VM Virtual Box, the job status in the 'Task' pane will change from 'running' to 'uploading', and task will report succesfull. (see Task 123141235) Is the process not able to give an 'file completed' to the VM when using the computer? e.g. too low process priority or similar I/O conflict? |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
I finally got one of the longrunners myself so I was able to debug it. It had been stuck for the last 2 days using zero CPU. From the log I saw that at some point it failed to allocate memory and after this the process exited but without shutting down the machine properly. I have increased the memory a little, so now the formula is 1.4GB + 1GB * ncores |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,268,029 RAC: 6,930 |
I have a new longrunner as SingleCoreWU. It has run now nearly 5 days, normal would be something up to 1 day. In difference to your WU, my is still consuming CPU-Power (= one full core). Any logfile that I could extract during runtime to see what is really going on ? Supporting BOINC, a great concept ! |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
On the Linux box I have one double core running 30 hours and is now at 96%. It uses 3000 MB RAM according to Virtual Box Manager and CPU usage is around 170%. On the Windows 10 PC double core Atlas Tasks outside LHC used 4100 MB RAM according to VBox Manager. This CPU is an A10-6700 AMD CPU which should have 4 cores, but the Windows Task Manager sees only 2 cores and 4 logical processors, so multicore Atlas tasks run on two cores. Tullio |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Doing a power down and restarting a bit later (not a reboot) the host may close normaly the work unit.-->and provide credits expected. It's a way to check if it is worth running the work unit for the very long wus. If errors inside vm are not detected , the restart might decide with initializing ,the purpose to go on or not. If the wu goes on , give it a further chance to finish by itself. If the wu ends , then you don't waste your host's time. It works for me who have only one task at once. |
Send message Joined: 1 Nov 05 Posts: 8 Credit: 597,196 RAC: 0 |
Yesterday neverending (looping?) multicore tasks appeared, which just keep running. Have aborted 1 task , stoped 1 through VBox, updated VBox, still endless loop, see VBox log. Any body an idea? 00:00:50.246422 VMMDev: Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff880310b2d610), OR(0x0), NOT(0xffffffff), flags(0x0) 00:02:16.236076 VMMDev: Guest Log: Copying input files into RunAtlas. 00:02:18.692928 VMMDev: Guest Log: Copied input files into RunAtlas. 00:02:20.860967 VMMDev: Guest Log: copied the webapp to /var/www 00:02:20.950547 VMMDev: Guest Log: This vm does not need to setup http proxy 00:02:21.031455 VMMDev: Guest Log: ATHENA_PROC_NUMBER=11 00:02:21.101961 VMMDev: Guest Log: Starting ATLAS job. (PandaID=3260989220) 00:54:04.894395 VMMDev: Guest Log: Copying input files into RunAtlas. 00:54:06.537649 VMMDev: Guest Log: Copied input files into RunAtlas. 00:54:06.974127 VMMDev: Guest Log: copied the webapp to /var/www 00:54:07.030660 VMMDev: Guest Log: This vm does not need to setup http proxy 00:54:07.079244 VMMDev: Guest Log: ATHENA_PROC_NUMBER=11 00:54:07.166297 VMMDev: Guest Log: Starting ATLAS job. (PandaID=3260989220) |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
the Atlas-team is searching for this problem. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
See the thread here Short summary: the problem has been fixed but will take a few hours to propagate. If you keep the jobs running they will exit and you will get the credit. |
©2024 CERN