Message boards : Number crunching : Computer Optimization
Message board moderation

To post messages, you must log in.

AuthorMessage
DamianToczek

Send message
Joined: 8 Nov 22
Posts: 6
Credit: 1,610,722
RAC: 0
Message 47885 - Posted: 22 Mar 2023, 22:58:29 UTC

Hello there,
I like tinkering with computers and I want to put it into something useful.

Has anyone any idea what LHC is benefiting from?

RAM, size, bandwidth, channels, ranks, frequency?

I'm running it on Windows10 currently and would like to know what would be the best to run? KVM, Linux, Windows?

Cheers and happy crunching
ID: 47885 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,818,488
RAC: 127,549
Message 47887 - Posted: 23 Mar 2023, 4:25:53 UTC - in response to Message 47885.  

You can beginning with Sixtrack (came in waves) and Theory (need Virtualbox).
Yeti's checklist in the Atlas-Folder is a info to look deeper in our Project (Atlas or CMS).
atm we have some trouble with CMS and Atlas-Tasks.
ID: 47887 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1689
Credit: 103,916,636
RAC: 122,088
Message 47888 - Posted: 23 Mar 2023, 6:21:22 UTC - in response to Message 47887.  

atm we have some trouble with CMS and Atlas-Tasks.
Well, CMS tasks are back; and with ATLAS, I did not notice any problems lately
ID: 47888 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,496,817
RAC: 2,374
Message 47891 - Posted: 23 Mar 2023, 10:13:49 UTC - in response to Message 47888.  

atm we have some trouble with CMS and Atlas-Tasks.
Well, CMS tasks are back; and with ATLAS, I did not notice any problems lately
Probably a scientist added a batch to the ATLAS-queue not meant for BOINC, cause the root-file to download for 1 task is 1110MB.
ID: 47891 · Report as offensive     Reply Quote
DamianToczek

Send message
Joined: 8 Nov 22
Posts: 6
Credit: 1,610,722
RAC: 0
Message 47894 - Posted: 23 Mar 2023, 20:39:19 UTC - in response to Message 47887.  
Last modified: 23 Mar 2023, 21:20:08 UTC

I think some are confused what Computer Optimization is...
What currently was suggested here is just dumping more work on my system and not optimizing anything.

Read the question please.
ID: 47894 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,818,488
RAC: 127,549
Message 47896 - Posted: 24 Mar 2023, 2:57:48 UTC - in response to Message 47894.  

What's your problem here:
Status: Alle (2586) · In Bearbeitung (235) · Überprüfung ausstehend (31) · Überprüfung ohne Ergebnis (0) · Gültig (2299) · Ungültig (0) · Fehler (21)
Anwendung: All (2586) · ATLAS (long simulation) (0) · ATLAS Simulation (10) · CMS Simulation (8) · SixTrack (194) · sixtracktest (0) · Theory Simulation (2374)
This is your stats atm!
ID: 47896 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,211,149
RAC: 11,932
Message 47897 - Posted: 24 Mar 2023, 10:13:12 UTC - in response to Message 47885.  

I can only give you the data i have over my Ryzen 3600 (6c12t) with 16gb RAM at 3200Mhz

I run only native Atlas on Linux so can't say much about the other sub projects. Sixtrack is hard to get steady work till the big update https://youtu.be/3QLk6y2WNGs?t=448 is released soonTM. Later will GPU work come too.

I run 2 Atlas native tasks at the same time with 6threads each. That takes right now around 8GB of RAM while idle. Here an older thread about the Memory usage of the projects and how it scales with threads https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4875


Did read in this Forum somewhere that native(linux) are quite a bit faster as they have less overhead (virtualbox) but i don't find it right now. So if you want to build a PC that is only running BOINC for LHC@home then the native tasks are better. It is just some work to set everything up like CVMFS and it is recommended to have a squid Proxy setup so the load on the Servers are smaller. The guides are pinned in https://lhcathome.cern.ch/lhcathome/forum_forum.php?id=3

Atlas tasks take quite some disk space/download. Usually ~40 tasks/around 2 days of work for my system need 20GB on the disk. Right now are the files bigger so it takes 45GB for the same amount of work but that is likely just a mistake as written above "cause the root-file to download for 1 task is 1110MB."

Up and download speeds should be not that heavy while running the Task(the Squid proxy helps to not download the same files for every workunit). But up- and downloading(~500mb download and ~500mb uploading) of the workunits will benefit from good internet speed. With my 100/40mbit internet the upload and Download speeds are usually limited by my Internet not the CERN server. uploading of the file is usually running when the next task started and has no CPU load yet.

Ram size should for Atlas be quite stable so for your 12core 24threads you could just run 4 Tasks at the same time with 6 threads each. that will around 16GB. Or you could run 3tasks with 8 threads but i have no RAM usage for that setup.

My guess would be that fast RAM and good bandwidth to the RAM is useful. with so much data in Memory. Did test around with 2400Mhz and 3200Mhz and did not notice a big enough change that i could see it in the Run time
(sec). But more cores and faster cores than mine will likely hit the memory Bandwidth harder. Don't think you will find good data for that and will likely need to test around with it yourself.

A Fast connection to the Disk should help too as the workunits are quite big for boinc tasks that need to be loaded.

CPU wise could you optimize for the most efficient Frequency(2.2GHz till 3.6Ghz does scale fine for the efficiency on my system. Only the boost gets more inefficient) but that is likely different for your CPU generation.

AVX512 is AFAIK not used yet "executing command: grep -o 'avx2[^ ]*\|AVX2[^ ]*' /proc/cpuinfo" from the Atlas native Log of this user with a CPU that is at least the same Generation as yours Ryzen 9 7950X 16-Core https://lhcathome.cern.ch/lhcathome/results.php?hostid=10821683&offset=0&show_names=0&state=4&appid=
ID: 47897 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 43
Credit: 51,882,754
RAC: 138,533
Message 47901 - Posted: 25 Mar 2023, 22:35:56 UTC - in response to Message 47897.  

AVX512 is AFAIK not used yet "executing command: grep -o 'avx2[^ ]*\|AVX2[^ ]*' /proc/cpuinfo" from the Atlas native Log of this user with a CPU that is at least the same Generation as yours Ryzen 9 7950X 16-Core https://lhcathome.cern.ch/lhcathome/results.php?hostid=10821683&offset=0&show_names=0&state=4&appid=

grep -o means exact match, so it won't find avx512 even if it's there. In addition, the output is redirected by the python code logging these anyway, likely because it's checking the output to make decisions. The grep command shown should have returned 32 lines of avx2 since 7950X does support avx2 and have that in feature flags.

 $ grep -o 'avx2[^ ]*\|AVX2[^ ]*' /proc/cpuinfo | uniq -c
     32 avx2


Regarding avx512, you are probably still right anyway though. Given the fact the setup script is only grepping for avx2, it likely means avx512 isn't a concern whatsoever. I did some digging since I was curious after your reply and you happened to point to one of my host. :-P

The command that's consuming CPU on my host:

/cvmfs/atlas.cern.ch/repo/sw/software/21.0/sw/lcg/releases/LCG_87/Python/2.7.10/x86_64-slc6-gcc49-opt/bin/python -tt /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasCore/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/bin/athena.py --preloadlib=/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5:/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so runargs.EVNTtoHITS.py SimuJobTransforms/skeleton.EVGENtoHIT_ISF.py

Those python code is likely uninteresting wrappers and I bet bulk of the work is done inside those libs loaded. avx512 instructions use zmm registers so I grepped for those.
$ objdump -d /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so | grep zmm
$ objdump -d /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5 | grep zmm
   32dc7:       62 f1 fe 48 6f 06       vmovdqu64 (%rsi),%zmm0
   32dd4:       62 d1 7d 48 e7 03       vmovntdq %zmm0,(%r11)
   32dda:       62 f1 fe 48 6f 46 01    vmovdqu64 0x40(%rsi),%zmm0
   32de8:       62 d1 7d 48 e7 43 01    vmovntdq %zmm0,0x40(%r11)
   32def:       62 f1 fe 48 6f 46 02    vmovdqu64 0x80(%rsi),%zmm0
   32dfd:       62 d1 7d 48 e7 43 02    vmovntdq %zmm0,0x80(%r11)
   32e04:       62 f1 fe 48 6f 46 03    vmovdqu64 0xc0(%rsi),%zmm0
   32e12:       62 d1 7d 48 e7 43 03    vmovntdq %zmm0,0xc0(%r11)
   32e2e:       62 f1 fe 48 6f 46 fc    vmovdqu64 -0x100(%rsi),%zmm0
   32e3c:       62 d1 7d 48 e7 43 fc    vmovntdq %zmm0,-0x100(%r11)
   32e43:       62 f1 fe 48 6f 46 fd    vmovdqu64 -0xc0(%rsi),%zmm0
   32e51:       62 d1 7d 48 e7 43 fd    vmovntdq %zmm0,-0xc0(%r11)
   32e58:       62 f1 fe 48 6f 46 fe    vmovdqu64 -0x80(%rsi),%zmm0
   32e66:       62 d1 7d 48 e7 43 fe    vmovntdq %zmm0,-0x80(%r11)
   32e6d:       62 f1 fe 48 6f 46 ff    vmovdqu64 -0x40(%rsi),%zmm0
   32e7b:       62 d1 7d 48 e7 43 ff    vmovntdq %zmm0,-0x40(%r11)
   32f7b:       62 f1 7c 48 10 46 f9    vmovups -0x1c0(%rsi),%zmm0
   32f82:       62 d1 7c 48 29 43 f9    vmovaps %zmm0,-0x1c0(%r11)
   32f89:       62 f1 7c 48 10 46 fa    vmovups -0x180(%rsi),%zmm0
   32f90:       62 d1 7c 48 29 43 fa    vmovaps %zmm0,-0x180(%r11)
   32f97:       62 f1 7c 48 10 46 fb    vmovups -0x140(%rsi),%zmm0
   32f9e:       62 d1 7c 48 29 43 fb    vmovaps %zmm0,-0x140(%r11)
   32fa5:       62 f1 7c 48 10 46 fc    vmovups -0x100(%rsi),%zmm0
   32fac:       62 d1 7c 48 29 43 fc    vmovaps %zmm0,-0x100(%r11)
   32fb3:       62 f1 7c 48 10 46 fd    vmovups -0xc0(%rsi),%zmm0
   32fba:       62 d1 7c 48 29 43 fd    vmovaps %zmm0,-0xc0(%r11)
   32fc1:       62 f1 7c 48 10 46 fe    vmovups -0x80(%rsi),%zmm0
   32fc8:       62 d1 7c 48 29 43 fe    vmovaps %zmm0,-0x80(%r11)
   32fcf:       62 f1 7c 48 10 46 ff    vmovups -0x40(%rsi),%zmm0
   32fd6:       62 d1 7c 48 29 43 ff    vmovaps %zmm0,-0x40(%r11)
   330a9:       62 f1 7c 48 10 06       vmovups (%rsi),%zmm0
   330af:       62 d1 7c 48 11 03       vmovups %zmm0,(%r11)
   330ba:       62 f1 7c 48 10 06       vmovups (%rsi),%zmm0
   330c0:       62 d1 7c 48 11 03       vmovups %zmm0,(%r11)
   330c6:       62 d1 7c 48 10 40 ff    vmovups -0x40(%r8),%zmm0
   330cd:       62 d1 7c 48 11 42 ff    vmovups %zmm0,-0x40(%r10)
   3475f:       62 d2 7d 48 7c c1       vpbroadcastd %r9d,%zmm0
   347b0:       62 d1 7d 48 e7 02       vmovntdq %zmm0,(%r10)
   347b6:       62 d1 7d 48 e7 42 01    vmovntdq %zmm0,0x40(%r10)
   347bd:       62 d1 7d 48 e7 42 02    vmovntdq %zmm0,0x80(%r10)
   347c4:       62 d1 7d 48 e7 42 03    vmovntdq %zmm0,0xc0(%r10)
   347d9:       62 d1 7d 48 e7 42 fc    vmovntdq %zmm0,-0x100(%r10)
   347e0:       62 d1 7d 48 e7 42 fd    vmovntdq %zmm0,-0xc0(%r10)
   347e7:       62 d1 7d 48 e7 42 fe    vmovntdq %zmm0,-0x80(%r10)
   347ee:       62 d1 7d 48 e7 42 ff    vmovntdq %zmm0,-0x40(%r10)
   34888:       62 d1 7c 48 29 42 f9    vmovaps %zmm0,-0x1c0(%r10)
   3488f:       62 d1 7c 48 29 42 fa    vmovaps %zmm0,-0x180(%r10)
   34896:       62 d1 7c 48 29 42 fb    vmovaps %zmm0,-0x140(%r10)
   3489d:       62 d1 7c 48 29 42 fc    vmovaps %zmm0,-0x100(%r10)
   348a4:       62 d1 7c 48 29 42 fd    vmovaps %zmm0,-0xc0(%r10)
   348ab:       62 d1 7c 48 29 42 fe    vmovaps %zmm0,-0x80(%r10)
   348b2:       62 d1 7c 48 29 42 ff    vmovaps %zmm0,-0x40(%r10)
   3494c:       62 d1 7c 48 11 02       vmovups %zmm0,(%r10)
   34957:       62 d1 7c 48 11 02       vmovups %zmm0,(%r10)
   3495d:       62 d1 7c 48 11 40 ff    vmovups %zmm0,-0x40(%r8)

Other than a small chunk of code that's using avx512 for memcpy and memset, there are no avx512 compute instructions AFAIC. It's also not a given these instructions are actually executed even if they are in the library assembly.

PS: There are probably more direct way of confirming this by instrumenting with some tools like perf, but I don't do profiling enough to know how. :-(
ID: 47901 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 43
Credit: 51,882,754
RAC: 138,533
Message 47902 - Posted: 25 Mar 2023, 22:52:13 UTC
Last modified: 25 Mar 2023, 22:59:51 UTC

Regarding optimization, I honestly don't think there are specific ones related to hardware for the project. Since you are using Windows, you might want to set up the squid proxy, which could save a lot of bandwidth while making cvmfs access faster in your VM.

Other general optimization still applies. You have the same Zen 4 CPUs as I do and the first thing is to make sure your EXPO profile is enabled in UEFI, so the memory is running at the frequency you paid for instead of the default DDR5-4800. Then the "Curve Optimizer" in BIOS can get me ~5% more frequency, but it's a trial and error process trying to get the curve down as much as possible. I did it in UEFI, but given you use Windows, you might be able to use AMD's Ryzen Master to do the same without reboot.

Linux is generally better here if you want to run LHC, which is native to Linux. You can set up native cvmfs fairly easily on modern distros and avoid paying the VM overhead. Theory would be better on Linux too, but it currently requires some hacking of the start script to get around the cgroupv2 problem, or disable cgroupv2 for the system. sixtracks have native app for Windows, so I assume they are equally good.

These are all my experience though and I am not a developer here.
ID: 47902 · Report as offensive     Reply Quote
DamianToczek

Send message
Joined: 8 Nov 22
Posts: 6
Credit: 1,610,722
RAC: 0
Message 48001 - Posted: 12 Apr 2023, 20:06:50 UTC - in response to Message 47902.  

Thank you for sharing your knowledge, I will definitely look into that and I'm planning switching to Linux. This is my Gaming machine that i run LHC on.
I'm so annoyed by windows that I plan switching to ArchLinux but as it is a rolling release it might break stuff but also allow me to just play latest games and have latest updates (and latest bugs)

My CPU is locked to 95W and is making 659k credits, the system runs 24/7, but when I play boinc is paused.
I would say 95W = 659k Credits monthly is a pretty decent achievement.
My CPU still Overclocks itself (PBO) up to 5.6GHz.

This is just what I nerd about, making something extremely efficient in high workloads but also make it useable for gamers. Because if we all used Boinc and run LHC we would have so much computation power while not becoming poor from electricity bills.
ID: 48001 · Report as offensive     Reply Quote
kotenok2000
Avatar

Send message
Joined: 21 Feb 11
Posts: 59
Credit: 543,728
RAC: 42
Message 49440 - Posted: 7 Feb 2024, 16:40:11 UTC
Last modified: 7 Feb 2024, 17:01:39 UTC

Theory tasks compile and run pythia8.exe and rivetvm.exe
I ran elfx86exts on them.(A rust program that disassembles a binary and print out which instruction set extensions it uses.)
They use SSE2 at most.
elfx86exts 3/cernvm/shared/pythia8/pythia8.exe
File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE2 (cvtsd2ss)
SSE1 (movss)
CMOV (cmovns)
Instruction set extensions used: CMOV, MODE64, SSE1, SSE2
CPU Generation: Intel Core

elfx86exts 3/cernvm/shared/rivetvm/rivetvm.exe
File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE2 (cvttsd2si)
Instruction set extensions used: MODE64, SSE2
CPU Generation: Intel Core
I still haven't figured out how to pass CFLAGS=" -march=native " into container environment even with app_info.xml
ID: 49440 · Report as offensive     Reply Quote

Message boards : Number crunching : Computer Optimization


©2024 CERN