Message boards : ATLAS application : Running (0.517 CPUs)
Message board moderation

To post messages, you must log in.

AuthorMessage
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43373 - Posted: 19 Sep 2020, 16:49:30 UTC

Got an Atlas task that shows that it is running using 0.517 CPUs. Is this normal?

If you need more info, please let me know.
ID: 43373 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43383 - Posted: 21 Sep 2020, 13:18:36 UTC

Additional information: It appears to me that since BOINC Manager on the desktop thinks that the ATLAS task only needs a fraction of a CPU, BOINC Manager will start more tasks than it has CPU's available. When all started tasks get going and ask for the resources they need, a task will get suspended then restarted frequently. When an ATLAS task has been running longer than any of the other tasks, it gets suspended and restarted more than any other task. When this happens, ATLAS tasks run much longer than they should if left uninterrupted.
ID: 43383 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43384 - Posted: 21 Sep 2020, 13:40:03 UTC - in response to Message 43383.  

Hello captainjack,

ATLAS using a fraction of a cpu thread is not normal. It only uses a fraction at the start (for about 6 to 10 minutes) and at the very end of a task.

I don't know exactly about the virtualbox version of ATLAS, but i think to remember that an ATLAS task need to run uninterrupted (200 events), because if it is being interrupted the complete tasks starts from the very beginning each time again.

You should visit Yeti's checklist for ATLAS and check it step by step:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359

Regards, djoser.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43384 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43385 - Posted: 21 Sep 2020, 14:45:59 UTC

djoser,

Thanks for the reply.

Yes, I am running the virtualbox version of ATLAS and yes, I have been through Yerti's checklist.

I have a screen capture of BOINC Manager with one of the ATLAS tasks in question that shows it using 0.517 CPUs that I would be glad to send to a project admin if someone will tell me where to send it.
ID: 43385 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,937,431
RAC: 137,496
Message 43386 - Posted: 21 Sep 2020, 15:16:40 UTC - in response to Message 43385.  

Since last year this has been reported a couple of times and as far as I remember nobody was able to find out what really causes it.
The only thing you can do is to set fix cpu values via an app_config.xml which would at least minimize the bad impact on your own client.


https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5231&postid=40762
Not the first thread that mentions the issue but if you follow the discussion it explains what to do.
ID: 43386 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43387 - Posted: 21 Sep 2020, 16:16:16 UTC

If anybody wants to look into this further, the Task number is 283389454 and the work unit number is 145065835.

I just aborted the task after it ran for 4 days 15 hours 56 min 18 sec without a successful completion.

I would be glad to help test a possible solution, just let me know when and where.
ID: 43387 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 43388 - Posted: 21 Sep 2020, 16:33:11 UTC - in response to Message 43387.  
Last modified: 21 Sep 2020, 16:33:34 UTC

Furthermore i can see that your computers are working on other projects concurrently. You could try to set those on hold (or even remove them from BOINC) and see how ATLAS behaves with no other projects interfering.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 43388 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43389 - Posted: 21 Sep 2020, 20:48:41 UTC

djoser suggested:

You could try to set those on hold (or even remove them from BOINC) and see how ATLAS behaves with no other projects interfering.

Good question. I will let the queue drain down to empty on one of my machines then let it download as many single CPU ATLAS tasks as it wants and see what happens. I already know it will be constrained by memory, but it will be interesting to see what happens.
ID: 43389 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43390 - Posted: 22 Sep 2020, 3:17:00 UTC

Machine I used for this test has 12 threads and 15.9 GB of memory.

I let LHC download 6 single core ATLAS tasks and started them one at a time. After each task had time to download additional data and go through all the initiation steps, I checked memory usage and started the next task.

With one task running, Windows plus the task was using 6.6 GB of memory.
With two tasks running, Windows plus 2 tasks were using 10.8 GM of memory.
With three tasks running, Windows plus 3 tasks were using 14.8 GB of memory.
With four tasks running, memory usage got up to 15.8 GB, it started banging away on the swap file, the system locked up and rebooted itself.

No tasks initiated with use of a partial CPU.

Once the system came back up, I limited LHC to 3 concurrent tasks and will let them run to completion with no other tasks running.
ID: 43390 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,126,074
RAC: 105,437
Message 43391 - Posted: 22 Sep 2020, 5:54:39 UTC - in response to Message 43390.  

Atlas is not able to say, you go out of Memory.
For 16 GByte you can let 3 Atlas running (4.8GByte for every task).
Your three task will running well.
ID: 43391 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 40
Credit: 10,588,317
RAC: 8,994
Message 43524 - Posted: 23 Oct 2020, 20:53:43 UTC

Just got another one of these. Task says it is "Running(0.849 CPUs)"

More tasks get started than the system can support and the ATLAS task gets suspended. I had to put in an app_config to restrict some of the other work to get the ATLAS task to restart.

Does anybody besides me think this is a problem?
ID: 43524 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,152,273
RAC: 15,739
Message 43526 - Posted: 23 Oct 2020, 21:27:33 UTC

I have never seen a situation like that. But I always have an app_config.xml file present where I set the avg_ncpus to what I want.
ID: 43526 · Report as offensive     Reply Quote

Message boards : ATLAS application : Running (0.517 CPUs)


©2024 CERN