What is better: 2 tasks 4 cores ea., or 4 tasks 2 cores ea. ?

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1868 Credit: 135,888,607 RAC: 89,588	Message 29415 - Posted: 19 Mar 2017, 12:36:14 UTC Before, with the ATLAS Project, and also now at LHC@home, on one of my PCs I run 3 tasks 3 cores ea. I.e. total 9 cores ATLAS plus 2 cores for GPUGRID (2 GPUs), on a 12-core processor (6+6HT - Intel i7-4930K). So, 11 out of 12 cores are busy, yielding a total CPU usage of 94-96% (according to the Windows Task Manager). RAM is 32GB. What I notice is that after some time, the PC reacts rather slowly when opening or closing some windows; when I then re-start the PC, it's back to normal for some 1-2 days, before it gets slow again. I am not satiesfied with this situation, as I think that a permanent CPU usage of 94-96% might be a little too much. What I do not know (as I cannot measure it) is by how much the processing of all the tasks is slowed down - but this might well be the case. Hence, I am thinking about changing the ATLAS configuration from a total of 9 cores to 8 cores. However, I have no idea what is better: to have 2 tasks 4 cores ea., or 4 tasks 2 cores ea. What are the pro's and con's ? ID: 29415 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 29416 - Posted: 19 Mar 2017, 13:30:49 UTC - in response to Message 29415. Last modified: 19 Mar 2017, 13:32:13 UTC Look at this thread. HerveUAE explains why lower multicore have a higher cpu efficiency for the host which have enough ram memory. For the hosts which have less than (2600 * number of cpus usable by Boinc), there is a choice to do , according to their amount of ram memory. Your host doesn't have this problem : ( 2600 * 8 (cpus you want to use) = 20800 MBytes of RAM < 32000 MBytes (RAM you have). Maybe the choice to run 8 * 1-core wu would be more interesting. In second choice : 4 * 2-core wu. In third choice 2 * 4-core wu. An important factor is your bandwith. I think more you have simultaneous wus running , you have to run them with an offset each one (15 to 20 min) , not to have the downloads from the servers in the same time , which could reduce the global efficiency expected. But the con's is the running time is higher for lower core wus. a one core wu lasts 8 times the 8-core wus. For a host which is running 7/24 , it's not a matter.For a host which runs only few hours somedays , problem can occur (deadline , re-initializing the go on of the wu with a probably prematured end,longrunner wus...) ID: 29416 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1868 Credit: 135,888,607 RAC: 89,588	Message 29417 - Posted: 19 Mar 2017, 13:48:45 UTC - in response to Message 29416. Last modified: 19 Mar 2017, 13:53:28 UTC Maybe the choice to run 8 * 1-core wu would be more interesting. In second choice : 4 * 2-core wu. In third choice 2 * 4-core wu. An important factor is your bandwith. thanks for the Information. 8 * 1 core may indeed sound interesting. On the other hand, the crunching time of one of these "longrunners" might then be some 170 hours (in view of the fact that now it takes about 55 hours with 3 cores). BTW, bandwith should be no problem here. ID: 29417 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29418 - Posted: 19 Mar 2017, 18:34:47 UTC Last modified: 19 Mar 2017, 18:36:08 UTC This is a subject on which I am particularly interested indeed :). It is generally admitted in the IT industry that multi-threaded processes do not use the available CPU at 100% of it's capacity due to many reasons, including locks on critical resources or limited bandwidth between your CPU and the RAM, which is shared among all the threads running on your CPU. So the only way to use really 100% of your CPU is to actually use only 1 core of all your available cores. The architecture of the software may however be such that it has minimal critical sections of code and uses a small amount of working memory, making best use of your CPU cache. In this case the software is efficiently using all allocated cores. I would assume ATLAS is like that. Now, if you divide the task "CPU time" by the number of allocated cores, and substract this from the "Run time", you get a gap where your task is actually not using your CPU. On SixTrack this gap ("idle time") is rather small: 100 to 400 seconds on my machines, but on ATLAS, this gap is rather big. I have 20 to 30 minutes, and often more. Assuming you have a gap of 30 minutes, and the task requires 4 hours CPU time, then: - A 8-core task would run in 1 hour, but using your CPU only 50% of the time, i.e. 50% efficiency. - A 4-core task would run in 1.5 hours, 66% efficiency. - A 2-core task would run in 2.5 hours, 80% efficiency. - A 1-core task would run in 4.5 hours, 91% efficiency. So 1-core tasks are the most efficient. From my own observations, the idle time is increasing with the number of cores, and this is because part of the ATLAS processing is running on 1 core only (single threaded), whatever the number of cores you have assigned to the task. However, each task (for the new version 1.01) needs 4400 MB RAM whatever the number of cores. This is my current assumption, but I may be wrong. When I tried 2600 MB for 1-core, it failed during the startup. So your total RAM may limit the number of tasks you can run concurrently. On my 32 Gbytes 8 cores machine, I have configured to run 4 times 2-core tasks. I also have 1 Einstein GPU task running. So I would suggest 4 times 2-cores since you have 32 Gbytes of RAM. With "longrunners" or if your idle time is very short, the efficiency of higher N-core tasks is much better and then running 4 times 2-cores or 2 times 4-cores would not make much difference. I would then suggest to run 2 times 4-cores so that you keep more RAM for everything else. We are the product of random evolution. ID: 29418 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1868 Credit: 135,888,607 RAC: 89,588	Message 29419 - Posted: 19 Mar 2017, 19:07:27 UTC - in response to Message 29418. However, each task (for the new version 1.01) needs 4400 MB RAM whatever the number of cores. This is my current assumption, but I may be wrong. In case you are right, then what Philippe is suggesting here: Maybe the choice to run 8 * 1-core wu would be more interesting. would not work with my 32 GB. So next choice would be 4 * 2-core. What I might do is to try the 4 * 2-core setting and see how much RAM is being used. ID: 29419 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,221,317 RAC: 163,800	Message 29420 - Posted: 19 Mar 2017, 19:26:57 UTC There is some discussion here too, more than 8 seems like a bad idea. http://atlasathome.cern.ch/forum_thread.php?id=568 I did some dual core but without the app config they didn't work so I switched back to single as I have plenty of RAM ID: 29420 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29421 - Posted: 19 Mar 2017, 19:42:25 UTC http://atlasathome.cern.ch/forum_thread.php?id=568 This is an ATLAS@Home discussion thread. In LHC@Home, there is a new, more efficient, version of the simulator (1.01), which may not behave similarly in terms of memory consumption and requirements. The new version was introduced in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4146 This thread also discussed that at least 4300 MB is needed for 2-core task, and 4400 MB is sufficient for 4-core and 8-core. No discussion on 1-core though, so I think some tests are yet to be done. Volunteering? :) We are the product of random evolution. ID: 29421 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29422 - Posted: 19 Mar 2017, 19:44:48 UTC I switched back to single as I have plenty of RAM Do you mean that 2600 MB for 1-core is working for you, Toby? We are the product of random evolution. ID: 29422 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 865 Credit: 720,221,317 RAC: 163,800	Message 29423 - Posted: 19 Mar 2017, 20:47:45 UTC - in response to Message 29422. Last modified: 19 Mar 2017, 20:50:00 UTC Yes, I haven't had an invalid since I disabled the 2 core in site preference. I've done 42 tasks with valid results. They have HITS files. The long runners are still going with max being 3d on Xeon 2657v3 & 2683 v3, with some at 2d and 1d on other systems. I see 53GB of ram usage with 15 ATLAS, 10 Rosetta, 2 CMS and 2 Theory running. ID: 29423 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 189	Message 29424 - Posted: 19 Mar 2017, 21:40:30 UTC - in response to Message 29421. ... This thread also discussed that at least 4300 MB is needed for 2-core task, and 4400 MB is sufficient for 4-core and 8-core. No discussion on 1-core though, so I think some tests are yet to be done. Volunteering? :) In another thread I wrote (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4146&postid=29274), that a dual core will run with 4300MB, but not efficient, cause the dual core is only using 1 thread. So for every multi-core reserve at least 4400MB of RAM. ID: 29424 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 455 Credit: 212,572,471 RAC: 52,279	Message 29430 - Posted: 20 Mar 2017, 9:11:32 UTC Sorry guys, but I'm not with you Trying to bring down the memory for MultiCoreWUs to the lower limit contains always the danger, that some WU might not work or might start swapping and swapping and swapping (inside the VM) and this will produce heavy disc-usage and may lengthen the runtime of a WU. For whome it is possible should spend more memory, I for me still use the "old" formula 2,5 GB + 0,8 * NumberOfCores for calculating the needed memory. Supporting BOINC, a great concept ! ID: 29430 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1448 Credit: 9,719,650 RAC: 189	Message 29433 - Posted: 20 Mar 2017, 11:10:04 UTC - in response to Message 29430. Last modified: 20 Mar 2017, 11:12:11 UTC Trying to bring down the memory for MultiCoreWUs to the lower limit contains always the danger, that some WU might not work or might start swapping and swapping and swapping (inside the VM) and this will produce heavy disc-usage and may lengthen the runtime of a WU. It's not the purpose to bring down the limit, but to determine the minimum required RAM. I wrote at least 4400MB for multicores. About the possible swapping - we really need the ALT-F3 "top" command in the VM-Console to monitor the swapping, CPU-usage and running processes. ID: 29433 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 29442 - Posted: 20 Mar 2017, 16:08:08 UTC - in response to Message 29433. About the possible swapping - we really need the ALT-F3 "top" command in the VM-Console to monitor the swapping, CPU-usage and running processes. There is no swap space inside the ATLAS VM. I'm still working on getting top to work... Those of you interested in the technical details of how ATLAS multi-process software works might like to look at these slides. The slides also explain the large initialisation time - running a single process for as long as possible in order to share as much memory as possible between the processes. As for the optimal number of cores, I think others have already said it all in this thread. If you can afford the memory and disk, single core gives you the best CPU efficiency, but can also increase the chance of failures. On my PC (4 + 4HT cores) I prefer to run 1 * 4 cores. Using the hyperthreaded cores gives me worse efficiency. ID: 29442 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0	Message 29455 - Posted: 20 Mar 2017, 20:28:35 UTC - in response to Message 29442. according to this thread, the best CPU performance is reached with 4 core multicore tasks: http://atlasathome.cern.ch/forum_thread.php?id=568 but here you were talking about CPU efficiency (CPUtime/runtime ??) and in this thread above they are talking about CPU performance (CPUtime/event). So in order to crunch an event in the fastest possible way, you should choose 4 core tasks. The post above is about half a year old, so I dont know if these numbers are still valid today and for LHC@home. ID: 29455 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29458 - Posted: 20 Mar 2017, 20:55:05 UTC according to this thread, the best CPU performance is reached with 4 core multicore tasks: http://atlasathome.cern.ch/forum_thread.php?id=568 When I joined ATLAS@Home end of last year, I took the graph of that same thread as a basis for setting my app_config.xml at 4 cores. But after some tests, I realised that I could crunch more tasks per day on my machine with a lower number of cores (3 or 2). And this is because of the long "idle time" that I mention earlier, which is explained by the presentation that David has shared with us. But this "idle time" is proportionally far less significant with the "longrunners", which is why I advocate the suggestion to have a separate "longrunners" application for those of us who can participate. Furthermore, as David said here (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4157&postid=29276#29276), it would make the work at LHC easier. We are the product of random evolution. ID: 29458 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 29462 - Posted: 20 Mar 2017, 21:19:34 UTC Those of you interested in the technical details of how ATLAS multi-process software works might like to look at these slides. Thanks for sharing this insight, David! We are the product of random evolution. ID: 29462 · Reply Quote

LHC@home