Computer Optimization

Author	Message
DamianToczek Send message Joined: 8 Nov 22 Posts: 6 Credit: 1,610,722 RAC: 0	Message 47885 - Posted: 22 Mar 2023, 22:58:29 UTC Hello there, I like tinkering with computers and I want to put it into something useful. Has anyone any idea what LHC is benefiting from? RAM, size, bandwidth, channels, ranks, frequency? I'm running it on Windows10 currently and would like to know what would be the best to run? KVM, Linux, Windows? Cheers and happy crunching ID: 47885 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2101 Credit: 159,818,488 RAC: 127,549	Message 47887 - Posted: 23 Mar 2023, 4:25:53 UTC - in response to Message 47885. You can beginning with Sixtrack (came in waves) and Theory (need Virtualbox). Yeti's checklist in the Atlas-Folder is a info to look deeper in our Project (Atlas or CMS). atm we have some trouble with CMS and Atlas-Tasks. ID: 47887 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1689 Credit: 103,916,636 RAC: 122,088	Message 47888 - Posted: 23 Mar 2023, 6:21:22 UTC - in response to Message 47887. atm we have some trouble with CMS and Atlas-Tasks. Well, CMS tasks are back; and with ATLAS, I did not notice any problems lately ID: 47888 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374	Message 47891 - Posted: 23 Mar 2023, 10:13:49 UTC - in response to Message 47888. atm we have some trouble with CMS and Atlas-Tasks. Well, CMS tasks are back; and with ATLAS, I did not notice any problems lately Probably a scientist added a batch to the ATLAS-queue not meant for BOINC, cause the root-file to download for 1 task is 1110MB. ID: 47891 · Reply Quote

DamianToczek Send message Joined: 8 Nov 22 Posts: 6 Credit: 1,610,722 RAC: 0	Message 47894 - Posted: 23 Mar 2023, 20:39:19 UTC - in response to Message 47887. Last modified: 23 Mar 2023, 21:20:08 UTC I think some are confused what Computer Optimization is... What currently was suggested here is just dumping more work on my system and not optimizing anything. Read the question please. ID: 47894 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2101 Credit: 159,818,488 RAC: 127,549	Message 47896 - Posted: 24 Mar 2023, 2:57:48 UTC - in response to Message 47894. What's your problem here: Status: Alle (2586) · In Bearbeitung (235) · Überprüfung ausstehend (31) · Überprüfung ohne Ergebnis (0) · Gültig (2299) · Ungültig (0) · Fehler (21) Anwendung: All (2586) · ATLAS (long simulation) (0) · ATLAS Simulation (10) · CMS Simulation (8) · SixTrack (194) · sixtracktest (0) · Theory Simulation (2374) This is your stats atm! ID: 47896 · Reply Quote

Toggleton Send message Joined: 4 Mar 17 Posts: 20 Credit: 8,211,149 RAC: 11,932	Message 47897 - Posted: 24 Mar 2023, 10:13:12 UTC - in response to Message 47885. I can only give you the data i have over my Ryzen 3600 (6c12t) with 16gb RAM at 3200Mhz I run only native Atlas on Linux so can't say much about the other sub projects. Sixtrack is hard to get steady work till the big update https://youtu.be/3QLk6y2WNGs?t=448 is released soonTM. Later will GPU work come too. I run 2 Atlas native tasks at the same time with 6threads each. That takes right now around 8GB of RAM while idle. Here an older thread about the Memory usage of the projects and how it scales with threads https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4875 Did read in this Forum somewhere that native(linux) are quite a bit faster as they have less overhead (virtualbox) but i don't find it right now. So if you want to build a PC that is only running BOINC for LHC@home then the native tasks are better. It is just some work to set everything up like CVMFS and it is recommended to have a squid Proxy setup so the load on the Servers are smaller. The guides are pinned in https://lhcathome.cern.ch/lhcathome/forum_forum.php?id=3 Atlas tasks take quite some disk space/download. Usually ~40 tasks/around 2 days of work for my system need 20GB on the disk. Right now are the files bigger so it takes 45GB for the same amount of work but that is likely just a mistake as written above "cause the root-file to download for 1 task is 1110MB." Up and download speeds should be not that heavy while running the Task(the Squid proxy helps to not download the same files for every workunit). But up- and downloading(~500mb download and ~500mb uploading) of the workunits will benefit from good internet speed. With my 100/40mbit internet the upload and Download speeds are usually limited by my Internet not the CERN server. uploading of the file is usually running when the next task started and has no CPU load yet. Ram size should for Atlas be quite stable so for your 12core 24threads you could just run 4 Tasks at the same time with 6 threads each. that will around 16GB. Or you could run 3tasks with 8 threads but i have no RAM usage for that setup. My guess would be that fast RAM and good bandwidth to the RAM is useful. with so much data in Memory. Did test around with 2400Mhz and 3200Mhz and did not notice a big enough change that i could see it in the Run time (sec). But more cores and faster cores than mine will likely hit the memory Bandwidth harder. Don't think you will find good data for that and will likely need to test around with it yourself. A Fast connection to the Disk should help too as the workunits are quite big for boinc tasks that need to be loaded. CPU wise could you optimize for the most efficient Frequency(2.2GHz till 3.6Ghz does scale fine for the efficiency on my system. Only the boost gets more inefficient) but that is likely different for your CPU generation. AVX512 is AFAIK not used yet "executing command: grep -o 'avx2[^ ]\\|AVX2[^ ]' /proc/cpuinfo" from the Atlas native Log of this user with a CPU that is at least the same Generation as yours Ryzen 9 7950X 16-Core https://lhcathome.cern.ch/lhcathome/results.php?hostid=10821683&offset=0&show_names=0&state=4&appid= ID: 47897 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 43 Credit: 51,882,754 RAC: 138,533	Message 47901 - Posted: 25 Mar 2023, 22:35:56 UTC - in response to Message 47897. AVX512 is AFAIK not used yet "executing command: grep -o 'avx2[^ ]\\|AVX2[^ ]' /proc/cpuinfo" from the Atlas native Log of this user with a CPU that is at least the same Generation as yours Ryzen 9 7950X 16-Core https://lhcathome.cern.ch/lhcathome/results.php?hostid=10821683&offset=0&show_names=0&state=4&appid= grep -o means exact match, so it won't find avx512 even if it's there. In addition, the output is redirected by the python code logging these anyway, likely because it's checking the output to make decisions. The grep command shown should have returned 32 lines of avx2 since 7950X does support avx2 and have that in feature flags. $ grep -o 'avx2[^ ]\\|AVX2[^ ]' /proc/cpuinfo \| uniq -c 32 avx2 Regarding avx512, you are probably still right anyway though. Given the fact the setup script is only grepping for avx2, it likely means avx512 isn't a concern whatsoever. I did some digging since I was curious after your reply and you happened to point to one of my host. :-P The command that's consuming CPU on my host: /cvmfs/atlas.cern.ch/repo/sw/software/21.0/sw/lcg/releases/LCG_87/Python/2.7.10/x86_64-slc6-gcc49-opt/bin/python -tt /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasCore/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/bin/athena.py --preloadlib=/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5:/cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so runargs.EVNTtoHITS.py SimuJobTransforms/skeleton.EVGENtoHIT_ISF.py Those python code is likely uninteresting wrappers and I bet bulk of the work is done inside those libs loaded. avx512 instructions use zmm registers so I grepped for those. $ objdump -d /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libimf.so \| grep zmm $ objdump -d /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasExternals/21.0.15/InstallArea/x86_64-slc6-gcc49-opt/lib/libintlc.so.5 \| grep zmm 32dc7: 62 f1 fe 48 6f 06 vmovdqu64 (%rsi),%zmm0 32dd4: 62 d1 7d 48 e7 03 vmovntdq %zmm0,(%r11) 32dda: 62 f1 fe 48 6f 46 01 vmovdqu64 0x40(%rsi),%zmm0 32de8: 62 d1 7d 48 e7 43 01 vmovntdq %zmm0,0x40(%r11) 32def: 62 f1 fe 48 6f 46 02 vmovdqu64 0x80(%rsi),%zmm0 32dfd: 62 d1 7d 48 e7 43 02 vmovntdq %zmm0,0x80(%r11) 32e04: 62 f1 fe 48 6f 46 03 vmovdqu64 0xc0(%rsi),%zmm0 32e12: 62 d1 7d 48 e7 43 03 vmovntdq %zmm0,0xc0(%r11) 32e2e: 62 f1 fe 48 6f 46 fc vmovdqu64 -0x100(%rsi),%zmm0 32e3c: 62 d1 7d 48 e7 43 fc vmovntdq %zmm0,-0x100(%r11) 32e43: 62 f1 fe 48 6f 46 fd vmovdqu64 -0xc0(%rsi),%zmm0 32e51: 62 d1 7d 48 e7 43 fd vmovntdq %zmm0,-0xc0(%r11) 32e58: 62 f1 fe 48 6f 46 fe vmovdqu64 -0x80(%rsi),%zmm0 32e66: 62 d1 7d 48 e7 43 fe vmovntdq %zmm0,-0x80(%r11) 32e6d: 62 f1 fe 48 6f 46 ff vmovdqu64 -0x40(%rsi),%zmm0 32e7b: 62 d1 7d 48 e7 43 ff vmovntdq %zmm0,-0x40(%r11) 32f7b: 62 f1 7c 48 10 46 f9 vmovups -0x1c0(%rsi),%zmm0 32f82: 62 d1 7c 48 29 43 f9 vmovaps %zmm0,-0x1c0(%r11) 32f89: 62 f1 7c 48 10 46 fa vmovups -0x180(%rsi),%zmm0 32f90: 62 d1 7c 48 29 43 fa vmovaps %zmm0,-0x180(%r11) 32f97: 62 f1 7c 48 10 46 fb vmovups -0x140(%rsi),%zmm0 32f9e: 62 d1 7c 48 29 43 fb vmovaps %zmm0,-0x140(%r11) 32fa5: 62 f1 7c 48 10 46 fc vmovups -0x100(%rsi),%zmm0 32fac: 62 d1 7c 48 29 43 fc vmovaps %zmm0,-0x100(%r11) 32fb3: 62 f1 7c 48 10 46 fd vmovups -0xc0(%rsi),%zmm0 32fba: 62 d1 7c 48 29 43 fd vmovaps %zmm0,-0xc0(%r11) 32fc1: 62 f1 7c 48 10 46 fe vmovups -0x80(%rsi),%zmm0 32fc8: 62 d1 7c 48 29 43 fe vmovaps %zmm0,-0x80(%r11) 32fcf: 62 f1 7c 48 10 46 ff vmovups -0x40(%rsi),%zmm0 32fd6: 62 d1 7c 48 29 43 ff vmovaps %zmm0,-0x40(%r11) 330a9: 62 f1 7c 48 10 06 vmovups (%rsi),%zmm0 330af: 62 d1 7c 48 11 03 vmovups %zmm0,(%r11) 330ba: 62 f1 7c 48 10 06 vmovups (%rsi),%zmm0 330c0: 62 d1 7c 48 11 03 vmovups %zmm0,(%r11) 330c6: 62 d1 7c 48 10 40 ff vmovups -0x40(%r8),%zmm0 330cd: 62 d1 7c 48 11 42 ff vmovups %zmm0,-0x40(%r10) 3475f: 62 d2 7d 48 7c c1 vpbroadcastd %r9d,%zmm0 347b0: 62 d1 7d 48 e7 02 vmovntdq %zmm0,(%r10) 347b6: 62 d1 7d 48 e7 42 01 vmovntdq %zmm0,0x40(%r10) 347bd: 62 d1 7d 48 e7 42 02 vmovntdq %zmm0,0x80(%r10) 347c4: 62 d1 7d 48 e7 42 03 vmovntdq %zmm0,0xc0(%r10) 347d9: 62 d1 7d 48 e7 42 fc vmovntdq %zmm0,-0x100(%r10) 347e0: 62 d1 7d 48 e7 42 fd vmovntdq %zmm0,-0xc0(%r10) 347e7: 62 d1 7d 48 e7 42 fe vmovntdq %zmm0,-0x80(%r10) 347ee: 62 d1 7d 48 e7 42 ff vmovntdq %zmm0,-0x40(%r10) 34888: 62 d1 7c 48 29 42 f9 vmovaps %zmm0,-0x1c0(%r10) 3488f: 62 d1 7c 48 29 42 fa vmovaps %zmm0,-0x180(%r10) 34896: 62 d1 7c 48 29 42 fb vmovaps %zmm0,-0x140(%r10) 3489d: 62 d1 7c 48 29 42 fc vmovaps %zmm0,-0x100(%r10) 348a4: 62 d1 7c 48 29 42 fd vmovaps %zmm0,-0xc0(%r10) 348ab: 62 d1 7c 48 29 42 fe vmovaps %zmm0,-0x80(%r10) 348b2: 62 d1 7c 48 29 42 ff vmovaps %zmm0,-0x40(%r10) 3494c: 62 d1 7c 48 11 02 vmovups %zmm0,(%r10) 34957: 62 d1 7c 48 11 02 vmovups %zmm0,(%r10) 3495d: 62 d1 7c 48 11 40 ff vmovups %zmm0,-0x40(%r8) Other than a small chunk of code that's using avx512 for memcpy and memset, there are no avx512 compute instructions AFAIC. It's also not a given these instructions are actually executed even if they are in the library assembly. PS: There are probably more direct way of confirming this by instrumenting with some tools like perf, but I don't do profiling enough to know how. :-( ID: 47901 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 43 Credit: 51,882,754 RAC: 138,533	Message 47902 - Posted: 25 Mar 2023, 22:52:13 UTC Last modified: 25 Mar 2023, 22:59:51 UTC Regarding optimization, I honestly don't think there are specific ones related to hardware for the project. Since you are using Windows, you might want to set up the squid proxy, which could save a lot of bandwidth while making cvmfs access faster in your VM. Other general optimization still applies. You have the same Zen 4 CPUs as I do and the first thing is to make sure your EXPO profile is enabled in UEFI, so the memory is running at the frequency you paid for instead of the default DDR5-4800. Then the "Curve Optimizer" in BIOS can get me ~5% more frequency, but it's a trial and error process trying to get the curve down as much as possible. I did it in UEFI, but given you use Windows, you might be able to use AMD's Ryzen Master to do the same without reboot. Linux is generally better here if you want to run LHC, which is native to Linux. You can set up native cvmfs fairly easily on modern distros and avoid paying the VM overhead. Theory would be better on Linux too, but it currently requires some hacking of the start script to get around the cgroupv2 problem, or disable cgroupv2 for the system. sixtracks have native app for Windows, so I assume they are equally good. These are all my experience though and I am not a developer here. ID: 47902 · Reply Quote

DamianToczek Send message Joined: 8 Nov 22 Posts: 6 Credit: 1,610,722 RAC: 0	Message 48001 - Posted: 12 Apr 2023, 20:06:50 UTC - in response to Message 47902. Thank you for sharing your knowledge, I will definitely look into that and I'm planning switching to Linux. This is my Gaming machine that i run LHC on. I'm so annoyed by windows that I plan switching to ArchLinux but as it is a rolling release it might break stuff but also allow me to just play latest games and have latest updates (and latest bugs) My CPU is locked to 95W and is making 659k credits, the system runs 24/7, but when I play boinc is paused. I would say 95W = 659k Credits monthly is a pretty decent achievement. My CPU still Overclocks itself (PBO) up to 5.6GHz. This is just what I nerd about, making something extremely efficient in high workloads but also make it useable for gamers. Because if we all used Boinc and run LHC we would have so much computation power while not becoming poor from electricity bills. ID: 48001 · Reply Quote

kotenok2000 Send message Joined: 21 Feb 11 Posts: 59 Credit: 543,728 RAC: 42	Message 49440 - Posted: 7 Feb 2024, 16:40:11 UTC Last modified: 7 Feb 2024, 17:01:39 UTC Theory tasks compile and run pythia8.exe and rivetvm.exe I ran elfx86exts on them.(A rust program that disassembles a binary and print out which instruction set extensions it uses.) They use SSE2 at most. elfx86exts 3/cernvm/shared/pythia8/pythia8.exe File format and CPU architecture: Elf, X86_64 MODE64 (call) SSE2 (cvtsd2ss) SSE1 (movss) CMOV (cmovns) Instruction set extensions used: CMOV, MODE64, SSE1, SSE2 CPU Generation: Intel Core elfx86exts 3/cernvm/shared/rivetvm/rivetvm.exe File format and CPU architecture: Elf, X86_64 MODE64 (call) SSE2 (cvttsd2si) Instruction set extensions used: MODE64, SSE2 CPU Generation: Intel Core I still haven't figured out how to pass CFLAGS=" -march=native " into container environment even with app_info.xml ID: 49440 · Reply Quote

LHC@home