Thread '2000 Events Threadripper 3995WX'

Author	Message
maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 48653 - Posted: 24 Sep 2023, 13:00:24 UTC Last modified: 24 Sep 2023, 13:06:38 UTC Finished today: 8 Cpu's 23 Sep 2023, 18:51:11 UTC 24 Sep 2023, 12:50:48 UTC Fertig und BestÃ¤tigt 64,018.61 476,227.40 12,480.55 Creditpoints ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64 ID: 48653 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 48655 - Posted: 24 Sep 2023, 15:42:05 UTC Last modified: 24 Sep 2023, 15:45:24 UTC Second on the same PC: Uploadfile: 1.43 GByte Laufzeit 17 Stunden 20 min. 46 sek. CPU Zeit 5 Tage 8 Stunden 47 min. 21 sek. 23 Sep 2023, 22:06:24 UTC 24 Sep 2023, 15:38:43 UTC Fertig und BestÃ¤tigt 62,446.21 463,641.60 10,991.18 ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64 Checkpoint file every 200 events is useful for this very long runtime. Hardware Acceleration in use therefore. ID: 48655 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 48664 - Posted: 26 Sep 2023, 3:46:48 UTC - in response to Message 48655. AMD Ryzen 9 3950X 16-Core Processor [Family 23 Model 113 Stepping 0] Windows 11 Workstation CPU Count for VM. (2) 2.000 events 23 Sep 2023, 21:42:03 UTC 25 Sep 2023, 19:53:21 UTC Fertig und BestÃ¤tigt 162,627.91 314,670.00 19,613.00 ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64 ID: 48664 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 48673 - Posted: 28 Sep 2023, 6:18:35 UTC Would be nice to get Longrunner (2000 events) only over a venue in LHC-prefs for those HPC (High performance Computer) - Threadripper3995WX. ID: 48673 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 48685 - Posted: 29 Sep 2023, 11:36:29 UTC Last modified: 29 Sep 2023, 11:37:29 UTC Four other on Threadripper 3995WX: Laufzeit 14 Stunden 24 min. 20 sek. CPU Zeit 4 Tage 4 Stunden 16 min. 59 sek. Laufzeit 14 Stunden 13 min. 35 sek. CPU Zeit 4 Tage 3 Stunden 5 min. 7 sek. Laufzeit 16 Stunden 25 min. 33 sek. CPU Zeit 4 Tage 22 Stunden 11 min. 55 sek. Laufzeit 16 Stunden 19 min. 1 sek. CPU Zeit 4 Tage 20 Stunden 40 min. 23 sek. ID: 48685 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 49617 - Posted: 22 Feb 2024, 21:21:50 UTC atm 1000 events for four Tasks. ID: 49617 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 49618 - Posted: 22 Feb 2024, 23:56:05 UTC - in response to Message 49617. Seven Tasks with 1k Events finished after 6 hours :-). ID: 49618 · Reply Quote

CloverField Send message Joined: 17 Oct 06 Posts: 99 Credit: 65,414,487 RAC: 10,522	Message 49619 - Posted: 23 Feb 2024, 13:05:22 UTC Any reason why the tasks suddenly jumped up to 6 hours? They used be like 40 min to 2 hours in the past? ID: 49619 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 49620 - Posted: 23 Feb 2024, 13:17:22 UTC - in response to Message 49619. Last modified: 23 Feb 2024, 13:45:55 UTC No idea, only Cern-IT can answer us. In the Spring of 2023, David Cameron makes tests with 2k Atlas-Tasks. When remember correct, 2k need no transfer for us, because the data is direct from the Collider of Atlas. @David Cameron: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47914#47914 Atlas Event Progress Monitor in RDP is now correct, Thank you. ID: 49620 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 804 Credit: 65,846,337 RAC: 27,216	Message 49621 - Posted: 23 Feb 2024, 14:49:51 UTC - in response to Message 49620. Atlas Event Progress Monitor in RDP is now correct, Thank you. That applies only to these long 1000 event tasks. For the short ones with 400 events, the monitoring does not work right. ID: 49621 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 303,350,972 RAC: 99,848	Message 49622 - Posted: 23 Feb 2024, 19:25:58 UTC According to the logs I checked there were tasks configured to process 100, 400, 500 or 1000 events. To vary the # events doesn't seem to be a good decision from the people who submitted the tasks since it finally unbalances BOINC's work fetch calculation, it's runtime estimation and it's credit calculation. A while ago some tests where done showing 500 events per task are a good compromise between the project needs and most volunteers can handle without major issues. ID: 49622 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 49623 - Posted: 23 Feb 2024, 19:31:42 UTC - in response to Message 49620. only Cern-IT can answer us. ID: 49623 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1554 Credit: 10,094,370 RAC: 1,831	Message 49625 - Posted: 23 Feb 2024, 20:59:31 UTC - in response to Message 49622. The 1000 events job produced a 756MB HITS-file to upload. ID: 49625 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 53 Credit: 84,919,495 RAC: 15,988	Message 49626 - Posted: 24 Feb 2024, 2:33:33 UTC I personally prefer the bigger jobs. From what I see, each ATLAS WU always has a 20-30 min idle setup time. Having more work per WU is going to help with efficiency quite a bit. It also seems to reduce the network usage on download side (from client). ID: 49626 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,568,072 RAC: 1,645	Message 49627 - Posted: 24 Feb 2024, 3:08:33 UTC - in response to Message 49626. I agree. Regards, Bob P. ID: 49627 · Reply Quote

Saturn911 Send message Joined: 3 Nov 12 Posts: 97 Credit: 193,040,849 RAC: 78,521	Message 49629 - Posted: 24 Feb 2024, 7:20:37 UTC - in response to Message 49626. I personally prefer the bigger jobs. From what I see, each ATLAS WU always has a 20-30 min idle setup time. Having more work per WU is going to help with efficiency quite a bit. It also seems to reduce the network usage on download side (from client). +1 ID: 49629 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1554 Credit: 10,094,370 RAC: 1,831	Message 49631 - Posted: 24 Feb 2024, 8:54:34 UTC I don't prefer these longrunners at all. 400 events is really the maximum. Better would be 200 events. From the BOINC point of view: Volunteer computing is meant for PC's not in use. Not all volunteers have monster machines. But the most important disadvantage of running ATLAS and/or CMS (up to 18hrs) jobs is that they need an uninterrupted network connection. This also means that the tasks cannot be suspended or shutdown for more than 20 minutes maybe an hour. A lot of crunchers want to shutdown their machines during evening / night to save electricity costs This machine is running a 1000 events job on 4 cores and is already 26 hours busy and another 7 hours to go. ID: 49631 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 303,350,972 RAC: 99,848	Message 49632 - Posted: 24 Feb 2024, 8:57:45 UTC I personally prefer the bigger jobs. This has been discussed forth and back, and yes, there are volunteers with fast computers and fast internet running their systems 24/7 (including mine). Those usually do not have problems even with 2000 eventers. On the other hand: - There are lots of computers that are not fast enough to finish large tasks within a reasonable time - ATLAS (native) does not support suspend/resume, hence tasks start from scratch - ATLAS generates huge upload files Together with other points mentioned in the past those 500 eventers were accepted as compromise. As for long setup times. They are usually shorter in case of - smaller EVNT files, less events - a local HTTP proxy is used - CVMFS is configured to use Cloudflareâ€™s CDN Especially on Linux a few cgroups tweaks via systemd can be set to ensure CPU cycles are not lost during an ATLAS setup but instead given to other running tasks. This slightly slows down an individual task but increases the total throughput of the computer. ID: 49632 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2302 Credit: 179,708,200 RAC: 30,147	Message 49634 - Posted: 24 Feb 2024, 10:23:43 UTC - in response to Message 49631. This machine is running a 1000 events job on 4 cores and is already 26 hours busy and another 7 hours to go. What's about a venue for Events in preferences? four or five venues (100, 200, 500, 1000, 2000). ID: 49634 · Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 101 Credit: 18,802,071 RAC: 11,029	Message 49636 - Posted: 24 Feb 2024, 11:08:02 UTC - in response to Message 49632. Especially on Linux a few cgroups tweaks via systemd can be set to ensure CPU cycles are not lost during an ATLAS setup but instead given to other running tasks. This slightly slows down an individual task but increases the total throughput of the computer. More detail, please. ID: 49636 · Reply Quote