Thread 'Running (0.517 CPUs)'

Author	Message
captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43373 - Posted: 19 Sep 2020, 16:49:30 UTC Got an Atlas task that shows that it is running using 0.517 CPUs. Is this normal? If you need more info, please let me know. ID: 43373 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43383 - Posted: 21 Sep 2020, 13:18:36 UTC Additional information: It appears to me that since BOINC Manager on the desktop thinks that the ATLAS task only needs a fraction of a CPU, BOINC Manager will start more tasks than it has CPU's available. When all started tasks get going and ask for the resources they need, a task will get suspended then restarted frequently. When an ATLAS task has been running longer than any of the other tasks, it gets suspended and restarted more than any other task. When this happens, ATLAS tasks run much longer than they should if left uninterrupted. ID: 43383 · Reply Quote

djoser Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0	Message 43384 - Posted: 21 Sep 2020, 13:40:03 UTC - in response to Message 43383. Hello captainjack, ATLAS using a fraction of a cpu thread is not normal. It only uses a fraction at the start (for about 6 to 10 minutes) and at the very end of a task. I don't know exactly about the virtualbox version of ATLAS, but i think to remember that an ATLAS task need to run uninterrupted (200 events), because if it is being interrupted the complete tasks starts from the very beginning each time again. You should visit Yeti's checklist for ATLAS and check it step by step: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359 Regards, djoser. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us ID: 43384 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43385 - Posted: 21 Sep 2020, 14:45:59 UTC djoser, Thanks for the reply. Yes, I am running the virtualbox version of ATLAS and yes, I have been through Yerti's checklist. I have a screen capture of BOINC Manager with one of the ATLAS tasks in question that shows it using 0.517 CPUs that I would be glad to send to a project admin if someone will tell me where to send it. ID: 43385 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2746 Credit: 302,599,481 RAC: 69,229	Message 43386 - Posted: 21 Sep 2020, 15:16:40 UTC - in response to Message 43385. Since last year this has been reported a couple of times and as far as I remember nobody was able to find out what really causes it. The only thing you can do is to set fix cpu values via an app_config.xml which would at least minimize the bad impact on your own client. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5231&postid=40762 Not the first thread that mentions the issue but if you follow the discussion it explains what to do. ID: 43386 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43387 - Posted: 21 Sep 2020, 16:16:16 UTC If anybody wants to look into this further, the Task number is 283389454 and the work unit number is 145065835. I just aborted the task after it ran for 4 days 15 hours 56 min 18 sec without a successful completion. I would be glad to help test a possible solution, just let me know when and where. ID: 43387 · Reply Quote

djoser Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0	Message 43388 - Posted: 21 Sep 2020, 16:33:11 UTC - in response to Message 43387. Last modified: 21 Sep 2020, 16:33:34 UTC Furthermore i can see that your computers are working on other projects concurrently. You could try to set those on hold (or even remove them from BOINC) and see how ATLAS behaves with no other projects interfering. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us ID: 43388 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43389 - Posted: 21 Sep 2020, 20:48:41 UTC djoser suggested: You could try to set those on hold (or even remove them from BOINC) and see how ATLAS behaves with no other projects interfering. Good question. I will let the queue drain down to empty on one of my machines then let it download as many single CPU ATLAS tasks as it wants and see what happens. I already know it will be constrained by memory, but it will be interesting to see what happens. ID: 43389 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43390 - Posted: 22 Sep 2020, 3:17:00 UTC Machine I used for this test has 12 threads and 15.9 GB of memory. I let LHC download 6 single core ATLAS tasks and started them one at a time. After each task had time to download additional data and go through all the initiation steps, I checked memory usage and started the next task. With one task running, Windows plus the task was using 6.6 GB of memory. With two tasks running, Windows plus 2 tasks were using 10.8 GM of memory. With three tasks running, Windows plus 3 tasks were using 14.8 GB of memory. With four tasks running, memory usage got up to 15.8 GB, it started banging away on the swap file, the system locked up and rebooted itself. No tasks initiated with use of a partial CPU. Once the system came back up, I limited LHC to 3 concurrent tasks and will let them run to completion with no other tasks running. ID: 43390 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2298 Credit: 179,555,966 RAC: 30,414	Message 43391 - Posted: 22 Sep 2020, 5:54:39 UTC - in response to Message 43390. Atlas is not able to say, you go out of Memory. For 16 GByte you can let 3 Atlas running (4.8GByte for every task). Your three task will running well. ID: 43391 · Reply Quote

captainjack Send message Joined: 21 Jun 10 Posts: 44 Credit: 15,135,646 RAC: 4,572	Message 43524 - Posted: 23 Oct 2020, 20:53:43 UTC Just got another one of these. Task says it is "Running(0.849 CPUs)" More tasks get started than the system can support and the ATLAS task gets suspended. I had to put in an app_config to restrict some of the other work to get the ATLAS task to restart. Does anybody besides me think this is a problem? ID: 43524 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 65,687,374 RAC: 24,499	Message 43526 - Posted: 23 Oct 2020, 21:27:33 UTC I have never seen a situation like that. But I always have an app_config.xml file present where I set the avg_ncpus to what I want. ID: 43526 · Reply Quote