Message boards :
ATLAS application :
Performance 4-core VM versus 3-core VM
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,495,513 RAC: 2,276 |
I had 2 ATLAS-tasks running 4-core VM with 6200MB RAM https://lhcathome.cern.ch/lhcathome/result.php?resultid=158808156 3-core VM with 5300MB RAM https://lhcathome.cern.ch/lhcathome/result.php?resultid=158810663 Following the tasks in consoles, I noted that in the 3-core VM all three athena.py processes were constantly present at almost 100% cpu. In the 4-core VM, very regular only 1, 2 or 3 athena.py processes were present with high cpu usage (also 4 of course). The performance of the 4-core VM after 246 minutes up time was 76% and of the 3-core VM after 114 minutes up time 93%. 4-core VM had a task from batch Task mc16_13TeV 1000_1500.simul (12236595) 3-core VM had a task from batch Task mc16_13TeV ZZvvqq_mqq20.simul (12236561) How could the performance of the 4-core VM be improved or was the task simple too heavy to use 4 cores all the time? |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,261,384 RAC: 4,382 |
You would need to compare results from a larger set of data. I would expect 'real' vs 'hyper threaded' or 'smt' would make a difference too. Is there a way to look at the raw, results data and look over a larger data set? I looked over seven tasks on my computer results. I took run time / cpu time and that gave me approximately 3.503. I divided that by four processors and I come up with about 87.6% efficiency. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,495,513 RAC: 2,276 |
I repeated the test. This time the 3-core and 4-core VM's both had an ATLAS job from the same batch with taskID=12236561. The difference is much smaller now, but still a little better performance for the 3-core VM. 3-cores performance athena's versus uptime 93,30% 4-cores performance athena's versus uptime 89.74% |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 9,173 |
|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
If you assume a constant initialisation time then running fewer cores will be more efficient, because the time using 100% CPU is longer. There was a long thread on the old ATLAS@Home forum related to this but it was more about credit: http://atlasathome.cern.ch/forum_thread.php?id=640 Also when we introduced the multicore app we did a study of the different core numbers. The plot below shows the CPU time per event for each core number. In general 1 to 4 are better than 5 to 8, probably due to the effects of hyperthreading. Above 8 it gets even worse so this is why we set 8 to be the maximum cores. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,495,513 RAC: 2,276 |
If you assume a constant initialisation time then running fewer cores will be more efficient, because the time using 100% CPU is longer. Thanks David for that info. Your linked thread is blocked, cause the Forums on the 'old' ATLAS site are disabled. No need for me to enable it. From your picture 4-core would be the best option. I think that's because 4-core was probably in more cases the same as 4 threads with Hyper-threading off. 3 or 5 threads is probably more often on machines with HT on and using other threads for other duties. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 9,173 |
Perhaps this graphic could be copied to the "Information on Atlas" thread Supporting BOINC, a great concept ! |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
Atlas-native app running two Atlas-Tasks with 2-Core on a AMD-A10 VM. After 5 days runtime, there are 466 hours CPU-use for ATLAS and 24 hours waste-time. This is 5 % waste-time. For me a good performance. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10495075 |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,878,564 RAC: 121,573 |
Maeax, for your waste time % calculation - how many finished tasks did you investigate? I was doing this now, out of curiosity, for my machine on which I run 2-core ATLAS Tasks on a Intel i7-4930k @ 3.9GHz). And what I found out was the the waste time percentages varied between 6 and 9 %. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
In Details of the Computer is a duration for a task shown with 0.35 day. This mean 8 tasks per day and two working parallel, are 16 or 17 per day together. Atlas tasks are very difficult, every task have a different duration time. |
Send message Joined: 22 Mar 17 Posts: 55 Credit: 10,223,976 RAC: 140 |
On my 1950x I run 4 threads with 2 tasks only since 9.8gb of memory is used. I noticed it took awhile for the CPU time to reach run time and show as 100% in BoincTasks so I started running some extra tasks on another client to keep the CPU busy. While the CPU is pushed with tasks > threads Atlas runs at about 2.5 CPUs utilized. There was a period where I ran out of other work and a couple of tasks were around 3.4 - 3.5 CPUs utilized out of 4. I could run fewer extra tasks but I'm ok with it. |
Send message Joined: 30 May 08 Posts: 93 Credit: 5,160,246 RAC: 0 |
For several days now, I've been running the vbox64_mt_mcore app set to use four cores of my six-core, 16GB RAM AMD/Linux host. Of 39 valid results, I'm seeing CPU-time/Run-time efficiencies of 63%-83%, depending on the length of the task. Is that about what I can expect or should they be doing better than that? Right now, I'm running CPDN tasks on the other two cores and I haven't checked to see if leaving those cores idle makes a difference. Maybe I'm naive, but I wouldn't expect that to make a difference. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,495,513 RAC: 2,276 |
...I'm seeing CPU-time/Run-time efficiencies of 63%-83%, depending on the length of the task. My last 6 tasks, all dual core, have an efficiency of 95,6%. The long runners with 200 events 98% and the ones with 50 events 93,2%. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,471,735 RAC: 131,946 |
An answer to a discussion here as it may be better located in this thread. mmonnin wrote: I was referencing David Cameron's post here: To make it (hopefully) clearer let's have a look at the following generic example. s: setup phase, shutdown phase i: idle, but allocated by our VM and therefore not available for other work A-D: event calculation Each character/number represents the same amount of time. 1core setup corex: doing other work core4: doing other work core3: doing other work core2: doing other work core1: sssssAAAAAAAAAAABBBBBBBBBBBCCCCCCCCCCCDDDDDDDDDDDss 2core setup corex: doing other work core4: doing other work core3: doing other work core2: iiiiiBBBBBBBBBBBDDDDDDDDDDDii core1: sssssAAAAAAAAAAACCCCCCCCCCCss 4core setup corex: doing other work core4: iiiiiDDDDDDDDDDDii core3: iiiiiCCCCCCCCCCCii core2: iiiiiBBBBBBBBBBBii core1: sssssAAAAAAAAAAAss This results in a CPU efficiency (A-D vs. total) of: 1core: 86% 2core: 76% 4core: 61% In reality those numbers are higher and closer together as the calculation phase A-D usually dominates the total runtime. As long as the event calculation A-D scales proportional to the # cores, a 1core setup is most efficient. Otherwise a general advice makes not much sense and the breakeven has to be determined individually for every host. This is how I understand David's post. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Otherwise a general advice makes not much sense and the breakeven has to be determined individually for every host. You are correct that the single core WU will theoretically give the highest efficiency of CPU usage per WU. In the plot I posted I think the numbers are affected by the different processor speeds of the hosts. For example it is more likely that older less powerful hosts run single core and newer faster hosts with more cores run 4 cores. In any case the differences from 1-5 cores are small compared with the difference between 4 and 8 cores, which was the main point of that post. The most important things to consider are probably how much memory you are willing to spend and how long you like the WU to run for. |
Send message Joined: 17 Sep 04 Posts: 99 Credit: 30,734,512 RAC: 7,668 |
Is there a limit on the number of instances of Atlas tasks running on any particular machine? I have 20 cores and was running five instances of 4-core Atlas tasks. In the interest of increasing efficiency, I switched to 2-core Atlas tasks. But instead of ten 2-core tasks running, I am seeing only seven 2-core Atlas tasks running. What happened? Thanks! Regards, Bob P. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,471,735 RAC: 131,946 |
Is there a limit on the number of instances of Atlas tasks running on any particular machine? There is no general limit. A couple of configuration settings may influence the number of concurrently running VMs: 1. Preferences on the project website (# cores -> RAM estimation) 2. CPU limit in app_config.xml (per app; total) 3. RAM setting in app_config.xml (-> total RAM estimation) 4. Local client's options (max RAM %, max swap %) Changes to (1.) do only affect VMs that are downloaded after the change, others become active after a "reload configuration files" or (4.) when you confirm them with "OK" in the GUI. You may run your local buffer empty, do/check the necessary changes and then download fresh work. |
©2024 CERN