Message boards : ATLAS application : What is better: 2 tasks 4 cores ea., or 4 tasks 2 cores ea. ?
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,484,180
RAC: 104,378
Message 29415 - Posted: 19 Mar 2017, 12:36:14 UTC

Before, with the ATLAS Project, and also now at LHC@home, on one of my PCs I run 3 tasks 3 cores ea. I.e. total 9 cores ATLAS plus 2 cores for GPUGRID (2 GPUs), on a 12-core processor (6+6HT - Intel i7-4930K). So, 11 out of 12 cores are busy, yielding a total CPU usage of 94-96% (according to the Windows Task Manager). RAM is 32GB.

What I notice is that after some time, the PC reacts rather slowly when opening or closing some windows; when I then re-start the PC, it's back to normal for some 1-2 days, before it gets slow again.
I am not satiesfied with this situation, as I think that a permanent CPU usage of 94-96% might be a little too much. What I do not know (as I cannot measure it) is by how much the processing of all the tasks is slowed down - but this might well be the case.

Hence, I am thinking about changing the ATLAS configuration from a total of 9 cores to 8 cores. However, I have no idea what is better: to have 2 tasks 4 cores ea., or 4 tasks 2 cores ea.
What are the pro's and con's ?
ID: 29415 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29416 - Posted: 19 Mar 2017, 13:30:49 UTC - in response to Message 29415.  
Last modified: 19 Mar 2017, 13:32:13 UTC

Look at this thread.
HerveUAE explains why lower multicore have a higher cpu efficiency for the host which have enough ram memory.
For the hosts which have less than (2600 * number of cpus usable by Boinc), there is a choice to do , according to their amount of ram memory.
Your host doesn't have this problem : ( 2600 * 8 (cpus you want to use) = 20800 MBytes of RAM < 32000 MBytes (RAM you have).

Maybe the choice to run 8 * 1-core wu would be more interesting.
In second choice : 4 * 2-core wu.
In third choice 2 * 4-core wu.

An important factor is your bandwith.

I think more you have simultaneous wus running , you have to run them with an offset each one (15 to 20 min) , not to have the downloads from the servers in the same time , which could reduce the global efficiency expected.
But the con's is the running time is higher for lower core wus.
a one core wu lasts 8 times the 8-core wus.
For a host which is running 7/24 , it's not a matter.For a host which runs only few hours somedays , problem can occur (deadline , re-initializing the go on of the wu with a probably prematured end,longrunner wus...)
ID: 29416 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,484,180
RAC: 104,378
Message 29417 - Posted: 19 Mar 2017, 13:48:45 UTC - in response to Message 29416.  
Last modified: 19 Mar 2017, 13:53:28 UTC


Maybe the choice to run 8 * 1-core wu would be more interesting.
In second choice : 4 * 2-core wu.
In third choice 2 * 4-core wu.

An important factor is your bandwith.

thanks for the Information.
8 * 1 core may indeed sound interesting. On the other hand, the crunching time of one of these "longrunners" might then be some 170 hours (in view of the fact that now it takes about 55 hours with 3 cores).

BTW, bandwith should be no problem here.
ID: 29417 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29418 - Posted: 19 Mar 2017, 18:34:47 UTC
Last modified: 19 Mar 2017, 18:36:08 UTC

This is a subject on which I am particularly interested indeed :).

It is generally admitted in the IT industry that multi-threaded processes do not use the available CPU at 100% of it's capacity due to many reasons, including locks on critical resources or limited bandwidth between your CPU and the RAM, which is shared among all the threads running on your CPU. So the only way to use really 100% of your CPU is to actually use only 1 core of all your available cores.
The architecture of the software may however be such that it has minimal critical sections of code and uses a small amount of working memory, making best use of your CPU cache. In this case the software is efficiently using all allocated cores. I would assume ATLAS is like that.

Now, if you divide the task "CPU time" by the number of allocated cores, and substract this from the "Run time", you get a gap where your task is actually not using your CPU. On SixTrack this gap ("idle time") is rather small: 100 to 400 seconds on my machines, but on ATLAS, this gap is rather big. I have 20 to 30 minutes, and often more.

Assuming you have a gap of 30 minutes, and the task requires 4 hours CPU time, then:
- A 8-core task would run in 1 hour, but using your CPU only 50% of the time, i.e. 50% efficiency.
- A 4-core task would run in 1.5 hours, 66% efficiency.
- A 2-core task would run in 2.5 hours, 80% efficiency.
- A 1-core task would run in 4.5 hours, 91% efficiency.
So 1-core tasks are the most efficient.
From my own observations, the idle time is increasing with the number of cores, and this is because part of the ATLAS processing is running on 1 core only (single threaded), whatever the number of cores you have assigned to the task.

However, each task (for the new version 1.01) needs 4400 MB RAM whatever the number of cores. This is my current assumption, but I may be wrong. When I tried 2600 MB for 1-core, it failed during the startup. So your total RAM may limit the number of tasks you can run concurrently. On my 32 Gbytes 8 cores machine, I have configured to run 4 times 2-core tasks. I also have 1 Einstein GPU task running.

So I would suggest 4 times 2-cores since you have 32 Gbytes of RAM.

With "longrunners" or if your idle time is very short, the efficiency of higher N-core tasks is much better and then running 4 times 2-cores or 2 times 4-cores would not make much difference. I would then suggest to run 2 times 4-cores so that you keep more RAM for everything else.
We are the product of random evolution.
ID: 29418 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,484,180
RAC: 104,378
Message 29419 - Posted: 19 Mar 2017, 19:07:27 UTC - in response to Message 29418.  

However, each task (for the new version 1.01) needs 4400 MB RAM whatever the number of cores. This is my current assumption, but I may be wrong.

In case you are right, then what Philippe is suggesting here:

Maybe the choice to run 8 * 1-core wu would be more interesting.

would not work with my 32 GB. So next choice would be 4 * 2-core.

What I might do is to try the 4 * 2-core setting and see how much RAM is being used.
ID: 29419 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,857,770
RAC: 226,089
Message 29420 - Posted: 19 Mar 2017, 19:26:57 UTC

There is some discussion here too, more than 8 seems like a bad idea.

http://atlasathome.cern.ch/forum_thread.php?id=568

I did some dual core but without the app config they didn't work so I switched back to single as I have plenty of RAM
ID: 29420 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29421 - Posted: 19 Mar 2017, 19:42:25 UTC

http://atlasathome.cern.ch/forum_thread.php?id=568

This is an ATLAS@Home discussion thread. In LHC@Home, there is a new, more efficient, version of the simulator (1.01), which may not behave similarly in terms of memory consumption and requirements.
The new version was introduced in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4146
This thread also discussed that at least 4300 MB is needed for 2-core task, and 4400 MB is sufficient for 4-core and 8-core. No discussion on 1-core though, so I think some tests are yet to be done.
Volunteering? :)
We are the product of random evolution.
ID: 29421 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29422 - Posted: 19 Mar 2017, 19:44:48 UTC

I switched back to single as I have plenty of RAM

Do you mean that 2600 MB for 1-core is working for you, Toby?
We are the product of random evolution.
ID: 29422 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,857,770
RAC: 226,089
Message 29423 - Posted: 19 Mar 2017, 20:47:45 UTC - in response to Message 29422.  
Last modified: 19 Mar 2017, 20:50:00 UTC

Yes, I haven't had an invalid since I disabled the 2 core in site preference.

I've done 42 tasks with valid results. They have HITS files.

The long runners are still going with max being 3d on Xeon 2657v3 & 2683 v3, with some at 2d and 1d on other systems.

I see 53GB of ram usage with 15 ATLAS, 10 Rosetta, 2 CMS and 2 Theory running.
ID: 29423 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 29424 - Posted: 19 Mar 2017, 21:40:30 UTC - in response to Message 29421.  

...
This thread also discussed that at least 4300 MB is needed for 2-core task, and 4400 MB is sufficient for 4-core and 8-core. No discussion on 1-core though, so I think some tests are yet to be done.
Volunteering? :)

In another thread I wrote (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4146&postid=29274),
that a dual core will run with 4300MB, but not efficient, cause the dual core is only using 1 thread.
So for every multi-core reserve at least 4400MB of RAM.
ID: 29424 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29430 - Posted: 20 Mar 2017, 9:11:32 UTC

Sorry guys, but I'm not with you

Trying to bring down the memory for MultiCoreWUs to the lower limit contains always the danger, that some WU might not work or might start swapping and swapping and swapping (inside the VM) and this will produce heavy disc-usage and may lengthen the runtime of a WU.

For whome it is possible should spend more memory, I for me still use the "old" formula 2,5 GB + 0,8 * NumberOfCores for calculating the needed memory.


Supporting BOINC, a great concept !
ID: 29430 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 29433 - Posted: 20 Mar 2017, 11:10:04 UTC - in response to Message 29430.  
Last modified: 20 Mar 2017, 11:12:11 UTC

Trying to bring down the memory for MultiCoreWUs to the lower limit contains always the danger, that some WU might not work or might start swapping and swapping and swapping (inside the VM) and this will produce heavy disc-usage and may lengthen the runtime of a WU.

It's not the purpose to bring down the limit, but to determine the minimum required RAM. I wrote at least 4400MB for multicores.
About the possible swapping - we really need the ALT-F3 "top" command in the VM-Console to monitor the swapping, CPU-usage and running processes.
ID: 29433 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 29442 - Posted: 20 Mar 2017, 16:08:08 UTC - in response to Message 29433.  

About the possible swapping - we really need the ALT-F3 "top" command in the VM-Console to monitor the swapping, CPU-usage and running processes.


There is no swap space inside the ATLAS VM. I'm still working on getting top to work...

Those of you interested in the technical details of how ATLAS multi-process software works might like to look at these slides.

The slides also explain the large initialisation time - running a single process for as long as possible in order to share as much memory as possible between the processes.

As for the optimal number of cores, I think others have already said it all in this thread. If you can afford the memory and disk, single core gives you the best CPU efficiency, but can also increase the chance of failures. On my PC (4 + 4HT cores) I prefer to run 1 * 4 cores. Using the hyperthreaded cores gives me worse efficiency.
ID: 29442 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 29455 - Posted: 20 Mar 2017, 20:28:35 UTC - in response to Message 29442.  

according to this thread, the best CPU performance is reached with 4 core multicore tasks:
http://atlasathome.cern.ch/forum_thread.php?id=568

but here you were talking about CPU efficiency (CPUtime/runtime ??) and in this thread above they are talking about CPU performance (CPUtime/event). So in order to crunch an event in the fastest possible way, you should choose 4 core tasks.

The post above is about half a year old, so I dont know if these numbers are still valid today and for LHC@home.
ID: 29455 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29458 - Posted: 20 Mar 2017, 20:55:05 UTC

according to this thread, the best CPU performance is reached with 4 core multicore tasks:
http://atlasathome.cern.ch/forum_thread.php?id=568

When I joined ATLAS@Home end of last year, I took the graph of that same thread as a basis for setting my app_config.xml at 4 cores. But after some tests, I realised that I could crunch more tasks per day on my machine with a lower number of cores (3 or 2). And this is because of the long "idle time" that I mention earlier, which is explained by the presentation that David has shared with us.

But this "idle time" is proportionally far less significant with the "longrunners", which is why I advocate the suggestion to have a separate "longrunners" application for those of us who can participate. Furthermore, as David said here (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4157&postid=29276#29276), it would make the work at LHC easier.
We are the product of random evolution.
ID: 29458 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29462 - Posted: 20 Mar 2017, 21:19:34 UTC

Those of you interested in the technical details of how ATLAS multi-process software works might like to look at these slides.

Thanks for sharing this insight, David!
We are the product of random evolution.
ID: 29462 · Report as offensive     Reply Quote

Message boards : ATLAS application : What is better: 2 tasks 4 cores ea., or 4 tasks 2 cores ea. ?


©2024 CERN