Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
Hello I've been out of the loop on this project for a while and decided to give it another go. I have nearly 200 cores to add, so just want to make sure I'm not going to bungle this with a lot of errors or stuck vms. I've got several 24 to 48 thread Xeons with anywhere from 32-64 gb ram, as well as a few 8 thread i7's with 16-32, all running Linux. Squid will be used, but my biggest concern is ram and stuck vms. I know Theory takes the least ram per WU - but for best efficiency, what would folks recommend for managing Atlas and CMS workunits? Can Boinc be trusted to manage ram on its own? Still trying to figure out number of workunits vs. number of CPUs, which I believe the latter only has to do with Atlas. Most of these, apart from a few, are running cheap ssds since I figure a lot of disk activity will be going on and with a lot of WUs crunching at the same time that might be a factor in how fast they start/stop, especially in regards to CMS. I'm not sure what the process is for each one to start and finish. Right now the goal is to add machines very slowly, making sure each one can crunch Atlas, Theory, and CMS with one WU sent of each before moving onto the next. Examples of processors and ram configurations - e5-2670v3 with 64 gb, e5-2680 with 32. If I remember right CMS and Atlas are the biggest users of bandwidth and disk? thanks and any help appreciated! |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 703,792,323 RAC: 161,849 ![]() ![]() ![]() |
I don't get many stuck VMs anymore. ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly. I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads. A squid proxy will reduce the load on the CERN servers and your internet usage. I use 90% of total thread to give some breathing room, for OS overhead running 42 CMS at once on a 48 thread system this however is 96 GB of ram usage and 100% CPU load. e.g. on one computer now, I have 14 ATLAS, 17 CMS and 5 theory this is using 98% CPU and 156 GB of memory. |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
I don't get many stuck VMs anymore.Great to hear! ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly.10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores? I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads.Thank you for those figures. How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing. |
![]() Send message Joined: 15 Jun 08 Posts: 2607 Credit: 262,565,847 RAC: 138,862 ![]() ![]() |
ATLAS RAM setting is calculated server side as: 3000 + 900 * n_cores "Unlimited" means: Without local tweaking ATLAS uses up to 8 cores for vbox tasks and up to 12 cores for native tasks. As a result the RAM calculation limit is 10200 MB for vbox and 13800 MB for native. (I'm not 100% sure if the native limit is still active). Modern internet connections usually don't suffer from low bandwidth. The limiting factors are latency and not enough RAM on the router(s) to handle the large number of concurrently open connections. Especially for this project since it transfers thousands of very small files. A local Squid keeps those connections inside your LAN, hence offloads routers (including the local one) and target servers. In case of CMS's frontier requests the reuse factor can be greater than 95 %. |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 703,792,323 RAC: 161,849 ![]() ![]() ![]() |
10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores? Its something to do with the unlimited selection, so BOINC will get more than 8 WU's, then I force back to a single core with appconfig. I can't remember what happens if you pick 1 core and unlimited jobs, then the ram is (what cp said) GB, same as CMS but then there is a limit on the number of WUs, not sure what the limit was though. How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing. network? My squid proxy has peak of 120 MB/s up and 92 down with averages of 1.9 and 0.23 (feeding 240 threads). The disk is about the same one the one running a mix of WUs as one running all CMS. The squid one is a bit more intense on the disk, but it's like 90% cached files in ram so again not so bad. |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
Thank you for all of that. I'll have to play with settings and maybe app config to figure out the best solution. Maybe 2 12 core WUs from Atlas and the rest for CMS and theory. How much ram does native theory and CMS use? |
![]() Send message Joined: 15 Jun 08 Posts: 2607 Credit: 262,565,847 RAC: 138,862 ![]() ![]() |
native Theory: usually 600-800 MB per task. BUT! occasionally there will be special tasks (madgraph) that allocate >6.5 GB! plus a 2nd core. CMS If not tweaked each task will set up a 2 GB VM. + some MB for vboxwrapper |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
native Theory: CMS - not tweaked? How can one tweak them and what will result? thanks |
Send message Joined: 2 May 07 Posts: 2260 Credit: 175,581,097 RAC: 11,545 ![]() ![]() ![]() |
6 Atlas-native with 12 CPU's 100 GByte RAM on a CentOS8 XEON with 72 CPU's (6x12). https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10587392 The Number One in our Computer Hitlist. Btw 6 of the best 20 Computer are.... from Toby Broom! |
![]() Send message Joined: 15 Jun 08 Posts: 2607 Credit: 262,565,847 RAC: 138,862 ![]() ![]() |
CMS - not tweaked? How can one tweak them and what will result? You can tweak some parameters using an app_config.xml. See this page for details: https://boinc.berkeley.edu/wiki/Client_configuration Your own app_config.xml must strictly follow the template shown there. Default values can be found in client_state.xml Mostly used for LHC@home tweaking: <max_concurrent>n</max_concurrent> <project_max_concurrent>N</project_max_concurrent> <avg_ncpus>x</avg_ncpus> # 1) The VM's RAM size can be tweaked using <cmdline>--memory_size_mb 2048</cmdline> # 2) 1) The manual explains: "...(possibly fractional) ..." but this makes no sense here since it also tells vboxwrapper how many cores it should configure for the VM. The latter only accepts integer values, hence use "x" or "x.0". 2) 2048 is the default for CMS and doesn't need to be specified here. Setting it higher would be waste of RAM since a VM never returns allocated RAM to the OS. Setting it a bit lower will slow down the VM. Setting it much lower will cause the scientific app not to run since it checks if enough RAM is available. |
Send message Joined: 27 Sep 08 Posts: 859 Credit: 703,792,323 RAC: 161,849 ![]() ![]() ![]() |
The VM's RAM size can be tweaked using take care with this one as BOINC doesn't know how much RAM is used, so you can end up with too much RAM usage and your computer locks up. |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
CMS - not tweaked? How can one tweak them and what will result? Thank you. Will give this a look. Think I will need to as I may need more than 3 preference types. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Hi, are you running SixTrack, virtually no I/O? Thanks. Eric |
Send message Joined: 17 Feb 17 Posts: 42 Credit: 2,589,736 RAC: 0 ![]() ![]() |
Hi, are you running SixTrack, virtually no I/O? Thanks. Eric I do have it selected, but there don't seem to be any tasks available currently. thanks |
©2025 CERN