Message boards : Number crunching : best practices/how to get most efficiency for higher core machines
Message board moderation

To post messages, you must log in.

AuthorMessage
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45183 - Posted: 11 Aug 2021, 22:52:15 UTC

Hello
I've been out of the loop on this project for a while and decided to give it another go. I have nearly 200 cores to add, so just want to make sure I'm not going to bungle this with a lot of errors or stuck vms.
I've got several 24 to 48 thread Xeons with anywhere from 32-64 gb ram, as well as a few 8 thread i7's with 16-32, all running Linux.
Squid will be used, but my biggest concern is ram and stuck vms.
I know Theory takes the least ram per WU - but for best efficiency, what would folks recommend for managing Atlas and CMS workunits? Can Boinc be trusted to manage ram on its own? Still trying to figure out number of workunits vs. number of CPUs, which I believe the latter only has to do with Atlas.
Most of these, apart from a few, are running cheap ssds since I figure a lot of disk activity will be going on and with a lot of WUs crunching at the same time that might be a factor in how fast they start/stop, especially in regards to CMS. I'm not sure what the process is for each one to start and finish.
Right now the goal is to add machines very slowly, making sure each one can crunch Atlas, Theory, and CMS with one WU sent of each before moving onto the next.
Examples of processors and ram configurations - e5-2670v3 with 64 gb, e5-2680 with 32.
If I remember right CMS and Atlas are the biggest users of bandwidth and disk?
thanks and any help appreciated!
ID: 45183 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,688,458
RAC: 235,342
Message 45189 - Posted: 12 Aug 2021, 17:04:11 UTC - in response to Message 45183.  

I don't get many stuck VMs anymore.

ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly.

I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads.

A squid proxy will reduce the load on the CERN servers and your internet usage.

I use 90% of total thread to give some breathing room, for OS overhead running 42 CMS at once on a 48 thread system this however is 96 GB of ram usage and 100% CPU load.

e.g. on one computer now, I have 14 ATLAS, 17 CMS and 5 theory this is using 98% CPU and 156 GB of memory.
ID: 45189 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45190 - Posted: 12 Aug 2021, 18:06:51 UTC - in response to Message 45189.  
Last modified: 12 Aug 2021, 18:07:26 UTC

I don't get many stuck VMs anymore.
Great to hear!

ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly.
10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores?

I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads.

A squid proxy will reduce the load on the CERN servers and your internet usage.

I use 90% of total thread to give some breathing room, for OS overhead running 42 CMS at once on a 48 thread system this however is 96 GB of ram usage and 100% CPU load.

e.g. on one computer now, I have 14 ATLAS, 17 CMS and 5 theory this is using 98% CPU and 156 GB of memory.
Thank you for those figures. How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing.
ID: 45190 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,898,865
RAC: 138,194
Message 45191 - Posted: 12 Aug 2021, 18:38:08 UTC - in response to Message 45190.  

ATLAS RAM setting is calculated server side as:
3000 + 900 * n_cores

"Unlimited" means:
Without local tweaking ATLAS uses up to 8 cores for vbox tasks and up to 12 cores for native tasks.
As a result the RAM calculation limit is 10200 MB for vbox and 13800 MB for native.
(I'm not 100% sure if the native limit is still active).

Modern internet connections usually don't suffer from low bandwidth.
The limiting factors are latency and not enough RAM on the router(s) to handle the large number of concurrently open connections.
Especially for this project since it transfers thousands of very small files.
A local Squid keeps those connections inside your LAN, hence offloads routers (including the local one) and target servers.
In case of CMS's frontier requests the reuse factor can be greater than 95 %.
ID: 45191 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,688,458
RAC: 235,342
Message 45192 - Posted: 12 Aug 2021, 18:40:50 UTC - in response to Message 45190.  
Last modified: 12 Aug 2021, 18:41:59 UTC

10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores?


Its something to do with the unlimited selection, so BOINC will get more than 8 WU's, then I force back to a single core with appconfig. I can't remember what happens if you pick 1 core and unlimited jobs, then the ram is (what cp said) GB, same as CMS but then there is a limit on the number of WUs, not sure what the limit was though.

How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing.


network? My squid proxy has peak of 120 MB/s up and 92 down with averages of 1.9 and 0.23 (feeding 240 threads). The disk is about the same one the one running a mix of WUs as one running all CMS. The squid one is a bit more intense on the disk, but it's like 90% cached files in ram so again not so bad.
ID: 45192 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45193 - Posted: 12 Aug 2021, 20:02:12 UTC

Thank you for all of that. I'll have to play with settings and maybe app config to figure out the best solution. Maybe 2 12 core WUs from Atlas and the rest for CMS and theory.
How much ram does native theory and CMS use?
ID: 45193 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,898,865
RAC: 138,194
Message 45194 - Posted: 12 Aug 2021, 20:37:58 UTC - in response to Message 45193.  

native Theory:
usually 600-800 MB per task.
BUT! occasionally there will be special tasks (madgraph) that allocate >6.5 GB! plus a 2nd core.

CMS
If not tweaked each task will set up a 2 GB VM.
+ some MB for vboxwrapper
ID: 45194 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45195 - Posted: 12 Aug 2021, 20:50:07 UTC - in response to Message 45194.  

native Theory:
usually 600-800 MB per task.
BUT! occasionally there will be special tasks (madgraph) that allocate >6.5 GB! plus a 2nd core.

CMS
If not tweaked each task will set up a 2 GB VM.
+ some MB for vboxwrapper

CMS - not tweaked? How can one tweak them and what will result?
thanks
ID: 45195 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,902
RAC: 104,657
Message 45198 - Posted: 13 Aug 2021, 3:58:40 UTC
Last modified: 13 Aug 2021, 4:31:17 UTC

6 Atlas-native with 12 CPU's 100 GByte RAM on a CentOS8 XEON with 72 CPU's (6x12).
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10587392
The Number One in our Computer Hitlist.
Btw 6 of the best 20 Computer are.... from Toby Broom!
ID: 45198 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,898,865
RAC: 138,194
Message 45199 - Posted: 13 Aug 2021, 6:30:54 UTC - in response to Message 45195.  

CMS - not tweaked? How can one tweak them and what will result?

You can tweak some parameters using an app_config.xml.
See this page for details:
https://boinc.berkeley.edu/wiki/Client_configuration

Your own app_config.xml must strictly follow the template shown there.
Default values can be found in client_state.xml

Mostly used for LHC@home tweaking:
<max_concurrent>n</max_concurrent>
<project_max_concurrent>N</project_max_concurrent>
<avg_ncpus>x</avg_ncpus> # 1)

The VM's RAM size can be tweaked using
<cmdline>--memory_size_mb 2048</cmdline> # 2)


1) The manual explains: "...(possibly fractional) ..." but this makes no sense here since it also tells vboxwrapper how many cores it should configure for the VM. The latter only accepts integer values, hence use "x" or "x.0".

2) 2048 is the default for CMS and doesn't need to be specified here.
Setting it higher would be waste of RAM since a VM never returns allocated RAM to the OS.
Setting it a bit lower will slow down the VM.
Setting it much lower will cause the scientific app not to run since it checks if enough RAM is available.
ID: 45199 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,688,458
RAC: 235,342
Message 45200 - Posted: 13 Aug 2021, 17:28:40 UTC - in response to Message 45199.  

The VM's RAM size can be tweaked using
<cmdline>--memory_size_mb 2048</cmdline> # 2)


take care with this one as BOINC doesn't know how much RAM is used, so you can end up with too much RAM usage and your computer locks up.
ID: 45200 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45201 - Posted: 13 Aug 2021, 17:29:25 UTC - in response to Message 45199.  

CMS - not tweaked? How can one tweak them and what will result?

You can tweak some parameters using an app_config.xml.
See this page for details:
https://boinc.berkeley.edu/wiki/Client_configuration

Your own app_config.xml must strictly follow the template shown there.
Default values can be found in client_state.xml

Mostly used for LHC@home tweaking:
<max_concurrent>n</max_concurrent>
<project_max_concurrent>N</project_max_concurrent>
<avg_ncpus>x</avg_ncpus> # 1)

The VM's RAM size can be tweaked using
<cmdline>--memory_size_mb 2048</cmdline> # 2)


1) The manual explains: "...(possibly fractional) ..." but this makes no sense here since it also tells vboxwrapper how many cores it should configure for the VM. The latter only accepts integer values, hence use "x" or "x.0".

2) 2048 is the default for CMS and doesn't need to be specified here.
Setting it higher would be waste of RAM since a VM never returns allocated RAM to the OS.
Setting it a bit lower will slow down the VM.
Setting it much lower will cause the scientific app not to run since it checks if enough RAM is available.

Thank you. Will give this a look. Think I will need to as I may need more than 3 preference types.
ID: 45201 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 45209 - Posted: 16 Aug 2021, 7:14:36 UTC - in response to Message 45183.  

Hi, are you running SixTrack, virtually no I/O? Thanks. Eric
ID: 45209 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 17 Feb 17
Posts: 42
Credit: 2,589,736
RAC: 0
Message 45221 - Posted: 18 Aug 2021, 1:00:06 UTC - in response to Message 45209.  

Hi, are you running SixTrack, virtually no I/O? Thanks. Eric

I do have it selected, but there don't seem to be any tasks available currently.
thanks
ID: 45221 · Report as offensive     Reply Quote

Message boards : Number crunching : best practices/how to get most efficiency for higher core machines


©2024 CERN