Message boards : ATLAS application : Performance 4-core VM versus 3-core VM
Message board moderation

To post messages, you must log in.

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,222,420
RAC: 2,285
Message 32732 - Posted: 9 Oct 2017, 20:31:11 UTC

I had 2 ATLAS-tasks running

4-core VM with 6200MB RAM https://lhcathome.cern.ch/lhcathome/result.php?resultid=158808156

3-core VM with 5300MB RAM https://lhcathome.cern.ch/lhcathome/result.php?resultid=158810663

Following the tasks in consoles, I noted that in the 3-core VM all three athena.py processes were constantly present at almost 100% cpu.
In the 4-core VM, very regular only 1, 2 or 3 athena.py processes were present with high cpu usage (also 4 of course).

The performance of the 4-core VM after 246 minutes up time was 76% and of the 3-core VM after 114 minutes up time 93%.

4-core VM had a task from batch Task mc16_13TeV 1000_1500.simul (12236595)
3-core VM had a task from batch Task mc16_13TeV ZZvvqq_mqq20.simul (12236561)

How could the performance of the 4-core VM be improved or was the task simple too heavy to use 4 cores all the time?
ID: 32732 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 15
Credit: 73,185
RAC: 2,467
Message 32739 - Posted: 9 Oct 2017, 23:47:33 UTC - in response to Message 32732.  

You would need to compare results from a larger set of data. I would expect 'real' vs 'hyper threaded' or 'smt' would make a difference too. Is there a way to look at the raw, results data and look over a larger data set?

I looked over seven tasks on my computer results. I took run time / cpu time and that gave me approximately 3.503. I divided that by four processors and I come up with about 87.6% efficiency.
ID: 32739 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,222,420
RAC: 2,285
Message 32750 - Posted: 10 Oct 2017, 9:30:09 UTC

I repeated the test. This time the 3-core and 4-core VM's both had an ATLAS job from the same batch with taskID=12236561.
The difference is much smaller now, but still a little better performance for the 3-core VM.
3-cores performance athena's versus uptime 93,30%
4-cores performance athena's versus uptime 89.74%
ID: 32750 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,466,564
RAC: 5,625
Message 32755 - Posted: 10 Oct 2017, 12:33:42 UTC

At the old Atlas-StandAlone-Project there was a post about the different efficiencys of the varying CoreNumbers. So, some fluctuations between the different number of Cores are normal.

Perhaps, David can re-post this stats


Supporting BOINC, a great concept !
ID: 32755 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 171
Credit: 3,753,955
RAC: 7,941
Message 32802 - Posted: 12 Oct 2017, 8:13:26 UTC - in response to Message 32755.  

If you assume a constant initialisation time then running fewer cores will be more efficient, because the time using 100% CPU is longer.

There was a long thread on the old ATLAS@Home forum related to this but it was more about credit: http://atlasathome.cern.ch/forum_thread.php?id=640

Also when we introduced the multicore app we did a study of the different core numbers. The plot below shows the CPU time per event for each core number. In general 1 to 4 are better than 5 to 8, probably due to the effects of hyperthreading. Above 8 it gets even worse so this is why we set 8 to be the maximum cores.

ID: 32802 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,222,420
RAC: 2,285
Message 32803 - Posted: 12 Oct 2017, 9:13:51 UTC - in response to Message 32802.  

If you assume a constant initialisation time then running fewer cores will be more efficient, because the time using 100% CPU is longer.

There was a long thread on the old ATLAS@Home forum related to this but it was more about credit: http://atlasathome.cern.ch/forum_thread.php?id=640

Also when we introduced the multicore app we did a study of the different core numbers. The plot below shows the CPU time per event for each core number. In general 1 to 4 are better than 5 to 8, probably due to the effects of hyperthreading. Above 8 it gets even worse so this is why we set 8 to be the maximum cores.

Thanks David for that info.
Your linked thread is blocked, cause the Forums on the 'old' ATLAS site are disabled. No need for me to enable it.
From your picture 4-core would be the best option.
I think that's because 4-core was probably in more cases the same as 4 threads with Hyper-threading off.
3 or 5 threads is probably more often on machines with HT on and using other threads for other duties.
ID: 32803 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 44,466,564
RAC: 5,625
Message 32808 - Posted: 12 Oct 2017, 22:19:51 UTC

Perhaps this graphic could be copied to the "Information on Atlas" thread


Supporting BOINC, a great concept !
ID: 32808 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 350
Credit: 13,007,664
RAC: 5,580
Message 33206 - Posted: 4 Dec 2017, 13:53:28 UTC

Atlas-native app running two Atlas-Tasks with 2-Core on a AMD-A10 VM.
After 5 days runtime, there are 466 hours CPU-use for ATLAS and 24 hours waste-time.
This is 5 % waste-time. For me a good performance.

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10495075
ID: 33206 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 685
Credit: 4,849,851
RAC: 4,790
Message 33209 - Posted: 4 Dec 2017, 16:30:29 UTC - in response to Message 33206.  

Maeax, for your waste time % calculation - how many finished tasks did you investigate?
I was doing this now, out of curiosity, for my machine on which I run 2-core ATLAS Tasks on a Intel i7-4930k @ 3.9GHz). And what I found out was the the waste time percentages varied between 6 and 9 %.
ID: 33209 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 350
Credit: 13,007,664
RAC: 5,580
Message 33210 - Posted: 4 Dec 2017, 17:56:18 UTC

In Details of the Computer is a duration for a task shown with 0.35 day.
This mean 8 tasks per day and two working parallel, are 16 or 17 per day together.
Atlas tasks are very difficult, every task have a different duration time.
ID: 33210 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 21
Credit: 727,106
RAC: 1,920
Message 33214 - Posted: 6 Dec 2017, 2:12:21 UTC

On my 1950x I run 4 threads with 2 tasks only since 9.8gb of memory is used. I noticed it took awhile for the CPU time to reach run time and show as 100% in BoincTasks so I started running some extra tasks on another client to keep the CPU busy.

While the CPU is pushed with tasks > threads Atlas runs at about 2.5 CPUs utilized. There was a period where I ran out of other work and a couple of tasks were around 3.4 - 3.5 CPUs utilized out of 4. I could run fewer extra tasks but I'm ok with it.
ID: 33214 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 30 May 08
Posts: 88
Credit: 3,723,203
RAC: 0
Message 33227 - Posted: 7 Dec 2017, 22:11:53 UTC

For several days now, I've been running the vbox64_mt_mcore app set to use four cores of my six-core, 16GB RAM AMD/Linux host. Of 39 valid results, I'm seeing CPU-time/Run-time efficiencies of 63%-83%, depending on the length of the task.

Is that about what I can expect or should they be doing better than that? Right now, I'm running CPDN tasks on the other two cores and I haven't checked to see if leaving those cores idle makes a difference. Maybe I'm naive, but I wouldn't expect that to make a difference.
ID: 33227 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 441
Credit: 3,222,420
RAC: 2,285
Message 33234 - Posted: 8 Dec 2017, 8:00:09 UTC - in response to Message 33227.  

...I'm seeing CPU-time/Run-time efficiencies of 63%-83%, depending on the length of the task.

Is that about what I can expect or should they be doing better than that?

My last 6 tasks, all dual core, have an efficiency of 95,6%. The long runners with 200 events 98% and the ones with 50 events 93,2%.
ID: 33234 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 525
Credit: 5,530,480
RAC: 18,831
Message 34583 - Posted: 11 Mar 2018, 16:58:44 UTC

An answer to a discussion here as it may be better located in this thread.


mmonnin wrote:
I was referencing David Cameron's post here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4479&postid=32802#32802

A single thread is 5th most efficient.

To make it (hopefully) clearer let's have a look at the following generic example.

s: setup phase, shutdown phase
i: idle, but allocated by our VM and therefore not available for other work
A-D: event calculation
Each character/number represents the same amount of time.


1core setup

corex: doing other work
core4: doing other work
core3: doing other work
core2: doing other work
core1: sssssAAAAAAAAAAABBBBBBBBBBBCCCCCCCCCCCDDDDDDDDDDDss


2core setup

corex: doing other work
core4: doing other work
core3: doing other work
core2: iiiiiBBBBBBBBBBBDDDDDDDDDDDii
core1: sssssAAAAAAAAAAACCCCCCCCCCCss


4core setup

corex: doing other work
core4: iiiiiDDDDDDDDDDDii
core3: iiiiiCCCCCCCCCCCii
core2: iiiiiBBBBBBBBBBBii
core1: sssssAAAAAAAAAAAss


This results in a CPU efficiency (A-D vs. total) of:
1core: 86%
2core: 76%
4core: 61%

In reality those numbers are higher and closer together as the calculation phase A-D usually dominates the total runtime.
As long as the event calculation A-D scales proportional to the # cores, a 1core setup is most efficient.

Otherwise a general advice makes not much sense and the breakeven has to be determined individually for every host.
This is how I understand David's post.
ID: 34583 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 171
Credit: 3,753,955
RAC: 7,941
Message 34594 - Posted: 12 Mar 2018, 15:39:16 UTC - in response to Message 34583.  

Otherwise a general advice makes not much sense and the breakeven has to be determined individually for every host.
This is how I understand David's post.


You are correct that the single core WU will theoretically give the highest efficiency of CPU usage per WU. In the plot I posted I think the numbers are affected by the different processor speeds of the hosts. For example it is more likely that older less powerful hosts run single core and newer faster hosts with more cores run 4 cores.

In any case the differences from 1-5 cores are small compared with the difference between 4 and 8 cores, which was the main point of that post. The most important things to consider are probably how much memory you are willing to spend and how long you like the WU to run for.
ID: 34594 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 70
Credit: 16,882,507
RAC: 15,469
Message 34655 - Posted: 15 Mar 2018, 0:04:35 UTC

Is there a limit on the number of instances of Atlas tasks running on any particular machine?

I have 20 cores and was running five instances of 4-core Atlas tasks. In the interest of increasing efficiency, I switched to 2-core Atlas tasks. But instead of ten 2-core tasks running, I am seeing only seven 2-core Atlas tasks running. What happened?

Thanks!
Regards,
Bob P.
ID: 34655 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 525
Credit: 5,530,480
RAC: 18,831
Message 34659 - Posted: 15 Mar 2018, 7:20:57 UTC - in response to Message 34655.  

Is there a limit on the number of instances of Atlas tasks running on any particular machine?

There is no general limit.

A couple of configuration settings may influence the number of concurrently running VMs:

1. Preferences on the project website (# cores -> RAM estimation)
2. CPU limit in app_config.xml (per app; total)
3. RAM setting in app_config.xml (-> total RAM estimation)
4. Local client's options (max RAM %, max swap %)

Changes to (1.) do only affect VMs that are downloaded after the change, others become active after a "reload configuration files" or (4.) when you confirm them with "OK" in the GUI.

You may run your local buffer empty, do/check the necessary changes and then download fresh work.
ID: 34659 · Report as offensive     Reply Quote

Message boards : ATLAS application : Performance 4-core VM versus 3-core VM


©2018 CERN