Message boards : ATLAS application : 2000 Events Threadripper 3995WX
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 48653 - Posted: 24 Sep 2023, 13:00:24 UTC
Last modified: 24 Sep 2023, 13:06:38 UTC

Finished today: 8 Cpu's
23 Sep 2023, 18:51:11 UTC 24 Sep 2023, 12:50:48 UTC Fertig und Bestätigt
64,018.61 476,227.40 12,480.55 Creditpoints
ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64
ID: 48653 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 48655 - Posted: 24 Sep 2023, 15:42:05 UTC
Last modified: 24 Sep 2023, 15:45:24 UTC

Second on the same PC:
Uploadfile: 1.43 GByte
Laufzeit 17 Stunden 20 min. 46 sek.
CPU Zeit 5 Tage 8 Stunden 47 min. 21 sek.
23 Sep 2023, 22:06:24 UTC 24 Sep 2023, 15:38:43 UTC Fertig und Bestätigt
62,446.21 463,641.60 10,991.18
ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64

Checkpoint file every 200 events is useful for this very long runtime.
Hardware Acceleration in use therefore.
ID: 48655 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 48664 - Posted: 26 Sep 2023, 3:46:48 UTC - in response to Message 48655.  

AMD Ryzen 9 3950X 16-Core Processor [Family 23 Model 113 Stepping 0]
Windows 11 Workstation
CPU Count for VM. (2) 2.000 events
23 Sep 2023, 21:42:03 UTC 25 Sep 2023, 19:53:21 UTC Fertig und Bestätigt
162,627.91 314,670.00 19,613.00
ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64
ID: 48664 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 48673 - Posted: 28 Sep 2023, 6:18:35 UTC

Would be nice to get Longrunner (2000 events) only over a venue in LHC-prefs for those
HPC (High performance Computer) - Threadripper3995WX.
ID: 48673 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 48685 - Posted: 29 Sep 2023, 11:36:29 UTC
Last modified: 29 Sep 2023, 11:37:29 UTC

Four other on Threadripper 3995WX:
Laufzeit 14 Stunden 24 min. 20 sek.
CPU Zeit 4 Tage 4 Stunden 16 min. 59 sek.
Laufzeit 14 Stunden 13 min. 35 sek.
CPU Zeit 4 Tage 3 Stunden 5 min. 7 sek.
Laufzeit 16 Stunden 25 min. 33 sek.
CPU Zeit 4 Tage 22 Stunden 11 min. 55 sek.
Laufzeit 16 Stunden 19 min. 1 sek.
CPU Zeit 4 Tage 20 Stunden 40 min. 23 sek.
ID: 48685 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 49617 - Posted: 22 Feb 2024, 21:21:50 UTC

atm 1000 events for four Tasks.
ID: 49617 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 49618 - Posted: 22 Feb 2024, 23:56:05 UTC - in response to Message 49617.  

Seven Tasks with 1k Events finished after 6 hours :-).
ID: 49618 · Report as offensive     Reply Quote
CloverField

Send message
Joined: 17 Oct 06
Posts: 84
Credit: 57,002,718
RAC: 3,723
Message 49619 - Posted: 23 Feb 2024, 13:05:22 UTC

Any reason why the tasks suddenly jumped up to 6 hours? They used be like 40 min to 2 hours in the past?
ID: 49619 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 49620 - Posted: 23 Feb 2024, 13:17:22 UTC - in response to Message 49619.  
Last modified: 23 Feb 2024, 13:45:55 UTC

No idea, only Cern-IT can answer us.
In the Spring of 2023, David Cameron makes tests with 2k Atlas-Tasks.
When remember correct, 2k need no transfer for us, because the data is direct from the Collider of Atlas.
@David Cameron:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47914#47914
Atlas Event Progress Monitor in RDP is now correct, Thank you.
ID: 49620 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 728
Credit: 48,807,067
RAC: 22,384
Message 49621 - Posted: 23 Feb 2024, 14:49:51 UTC - in response to Message 49620.  

Atlas Event Progress Monitor in RDP is now correct, Thank you.


That applies only to these long 1000 event tasks. For the short ones with 400 events, the monitoring does not work right.
ID: 49621 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2528
Credit: 253,722,201
RAC: 62,755
Message 49622 - Posted: 23 Feb 2024, 19:25:58 UTC

According to the logs I checked there were tasks configured to process 100, 400, 500 or 1000 events.
To vary the # events doesn't seem to be a good decision from the people who submitted the tasks since it finally unbalances BOINC's work fetch calculation, it's runtime estimation and it's credit calculation.

A while ago some tests where done showing 500 events per task are a good compromise between the project needs and most volunteers can handle without major issues.
ID: 49622 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 49623 - Posted: 23 Feb 2024, 19:31:42 UTC - in response to Message 49620.  

only Cern-IT can answer us.
ID: 49623 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1417
Credit: 9,439,917
RAC: 1,165
Message 49625 - Posted: 23 Feb 2024, 20:59:31 UTC - in response to Message 49622.  

The 1000 events job produced a 756MB HITS-file to upload.
ID: 49625 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 52
Credit: 63,285,293
RAC: 7,071
Message 49626 - Posted: 24 Feb 2024, 2:33:33 UTC

I personally prefer the bigger jobs. From what I see, each ATLAS WU always has a 20-30 min idle setup time. Having more work per WU is going to help with efficiency quite a bit. It also seems to reduce the network usage on download side (from client).
ID: 49626 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 105
Credit: 32,824,853
RAC: 860
Message 49627 - Posted: 24 Feb 2024, 3:08:33 UTC - in response to Message 49626.  

I agree.
Regards,
Bob P.
ID: 49627 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 58
Credit: 140,949,335
RAC: 91,508
Message 49629 - Posted: 24 Feb 2024, 7:20:37 UTC - in response to Message 49626.  

I personally prefer the bigger jobs. From what I see, each ATLAS WU always has a 20-30 min idle setup time. Having more work per WU is going to help with efficiency quite a bit. It also seems to reduce the network usage on download side (from client).

+1
ID: 49629 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1417
Credit: 9,439,917
RAC: 1,165
Message 49631 - Posted: 24 Feb 2024, 8:54:34 UTC

I don't prefer these longrunners at all. 400 events is really the maximum. Better would be 200 events.
From the BOINC point of view:
Volunteer computing is meant for PC's not in use.
Not all volunteers have monster machines.

But the most important disadvantage of running ATLAS and/or CMS (up to 18hrs) jobs is that they need an uninterrupted network connection.
This also means that the tasks cannot be suspended or shutdown for more than 20 minutes maybe an hour.
A lot of crunchers want to shutdown their machines during evening / night to save electricity costs

This machine is running a 1000 events job on 4 cores and is already 26 hours busy and another 7 hours to go.
ID: 49631 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2528
Credit: 253,722,201
RAC: 62,755
Message 49632 - Posted: 24 Feb 2024, 8:57:45 UTC

I personally prefer the bigger jobs.

This has been discussed forth and back, and yes, there are volunteers with fast computers and fast internet running their systems 24/7 (including mine).
Those usually do not have problems even with 2000 eventers.

On the other hand:
- There are lots of computers that are not fast enough to finish large tasks within a reasonable time
- ATLAS (native) does not support suspend/resume, hence tasks start from scratch
- ATLAS generates huge upload files

Together with other points mentioned in the past those 500 eventers were accepted as compromise.

As for long setup times.
They are usually shorter in case of
- smaller EVNT files, less events
- a local HTTP proxy is used
- CVMFS is configured to use Cloudflare’s CDN


Especially on Linux a few cgroups tweaks via systemd can be set to ensure CPU cycles are not lost during an ATLAS setup but instead given to other running tasks.
This slightly slows down an individual task but increases the total throughput of the computer.
ID: 49632 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2240
Credit: 173,894,884
RAC: 3,757
Message 49634 - Posted: 24 Feb 2024, 10:23:43 UTC - in response to Message 49631.  

This machine is running a 1000 events job on 4 cores and is already 26 hours busy and another 7 hours to go.

What's about a venue for Events in preferences?
four or five venues (100, 200, 500, 1000, 2000).
ID: 49634 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 91
Credit: 15,641,688
RAC: 18,174
Message 49636 - Posted: 24 Feb 2024, 11:08:02 UTC - in response to Message 49632.  

Especially on Linux a few cgroups tweaks via systemd can be set to ensure CPU cycles are not lost during an ATLAS setup but instead given to other running tasks.
This slightly slows down an individual task but increases the total throughput of the computer.

More detail, please.
ID: 49636 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : 2000 Events Threadripper 3995WX


©2024 CERN