Message boards : Theory Application : Theory Sherpa
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,363
RAC: 3,820
Message 43303 - Posted: 2 Sep 2020, 19:38:27 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=281574899

Still just running for 10 days and using CPU only to become *Computation error*

Same with version 300.06 and 5.21

They use CPU and internet UL/DL the entire time but just end up wasting 10 days and some of the members are running so many cores that they don't notice they got Sherpa's along with the other event generators.
ID: 43303 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 43374 - Posted: 20 Sep 2020, 9:11:36 UTC - in response to Message 43303.  
Last modified: 20 Sep 2020, 9:15:34 UTC

sherpa 2.2.4 default
pp winclusive 7000 20 - 0+20/20 - Lean back and enjoy since 24 hours?!?
Theory_2390-1149680-46_0
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=145132057
ID: 43374 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 165
Credit: 14,925,288
RAC: 34
Message 43376 - Posted: 20 Sep 2020, 10:04:07 UTC - in response to Message 43374.  

sherpa 2.2.4 default
pp winclusive 7000 20 - 0+20/20 - Lean back and enjoy since 24 hours?!?
I'd give up on it.
ID: 43376 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 43378 - Posted: 20 Sep 2020, 10:46:22 UTC - in response to Message 43376.  
Last modified: 20 Sep 2020, 10:46:52 UTC

I'd give up on it.

Ok, this is a Ryzen3950x. Will wait the next two days.
ID: 43378 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 43430 - Posted: 28 Sep 2020, 13:46:40 UTC - in response to Message 43376.  

After 5 days and only lean back and enjoy text.. canceled it.
Now the next Volunteer is running the task.
Sherpa 2.2.4 need investigation from Cern-IT!
ID: 43430 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48397 - Posted: 9 Aug 2023, 5:33:33 UTC
Last modified: 9 Aug 2023, 5:57:14 UTC

Sherpa stopped always at the same Event:
Canceled it now!

Is there a interest to stop this Sherpa's?

INFO: (display) vars=pp jets 7000 25,-,100 sherpa 1.4.1 default
INFO: display service switched off
===> [rungen] Wed Aug 9 04:49:47 UTC 2023 [boinc pp jets 7000 25,-,100 - sherpa 1.4.1 default 100000 524 /shared/tmp/tmp.D0HQTRKnmB/generator.hepmc]

Setting environment for sherpa 1.4.1 ...
22700 events processed
Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.676233, y = 0.559974).
Event 22800 ( 19m 31s elapsed / 1h 6m 6s left ) -> ETA: Wed Aug 09 06:24
22800 events processed
Event 22900 ( 19m 36s elapsed / 1h 6m left ) -> ETA: Wed Aug 09 06:24
22900 events processed
ID: 48397 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 15 Jul 05
Posts: 23
Credit: 2,055,077
RAC: 1,385
Message 48399 - Posted: 9 Aug 2023, 14:14:38 UTC - in response to Message 48397.  
Last modified: 9 Aug 2023, 14:15:49 UTC

Looks like I too have a problem with Sherpa process
https://lhcathome.cern.ch/lhcathome/result.php?resultid=397140224

    Event 7900 ( 18m ls elapsed / 3h 30m Zs left I -> ETA: Sat Aug 05 23:09
    7900 events processed
    Event 8000 ( 18m 15s elapsed / 3h 29m 54s left I -> ETA: Sat Aug 05 23:09 XS = 178969 pb +- ( 1997.22 pb = 1.11 %. )
    8000 events processed
    dumping histograms ...
    Error in Splitting_Tools::ConstructKinematics(kt = -nan, z = 0.589889, y = 0.300539).
    Event 8100 ( 18m 30s elapsed / 3h 29m 54s left I -> ETA: Sat Aug 05 23:09
    Rivet. Analysis. CMS_2011_S8968497: WARN Skipping histo with null area /CMS_2011_S8968497 /d01-x01-y01
    Rivet. Analysis. CMS_2011_S8968497: WARN Skipping histo with null area /CMS_Z011_S8968497 /d0Z-x01-y01
    Rivet. Analysis. CMS_Z011_S8968497: WARN Skipping histo with null area /CMS_Z011_S8968497 /d03-x01-y01
    Rivet. Analysis. CMS_Z011_S8968497: WARN Skipping histo with null area /CMS_Z011_S8968497 /d04-x01-y01
    8100 events processed
    Event 8200 ( 18m 41s elapsed / 3h 29m 13s left I -> ETA: Sat Aug 05 23:09
    8200 events processed



The Sherpa process is running at round 100% CPU load, but no new events are displayed since last ETA: Sat Aug 05 23:09 entry.
I'll aboard the task tomorrow if I get no other advice. But the task will not be able to finish in time on my computer.

I had the same with task https://lhcathome.cern.ch/lhcathome/result.php?resultid=396798081 where no new events have been displayed for most of the runtime and the task was canceled after 860000 (10 days) runtime/CPU time[/url]


Matthias

ID: 48399 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48400 - Posted: 9 Aug 2023, 14:52:05 UTC - in response to Message 48399.  

We need Theory WITHOUT Sherpa or if wanted from the Volunteer's with Sherpa.
This daily watching, when you have 64 Core is NOT a good handling!!
ID: 48400 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,008,579
RAC: 136,094
Message 48401 - Posted: 9 Aug 2023, 15:29:33 UTC
Last modified: 9 Aug 2023, 15:30:12 UTC

Errors can be annoying, but so far this sherpa setup computed 11 million events with a success rate of close to 90 %.

From mc-plots:
run 	events 	attempts 	success 	failure 	unknown
pp jets 7000 25,-,100 - sherpa 1.4.1 default 	11000000 	124 	110 	2	12

Hence, I doubt it will completely be stopped.
ID: 48401 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48402 - Posted: 9 Aug 2023, 16:22:01 UTC - in response to Message 48401.  

Sorry, we have no Cray or Summit to do Sherpa.
ID: 48402 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,008,579
RAC: 136,094
Message 48403 - Posted: 9 Aug 2023, 17:21:43 UTC

How to read the mc-plots values:
124 tasks with the "pp jets 7000 25,-,100 - sherpa 1.4.1 default" parameter set have been sent to BOINC clients.
110 tasks have successfully been returned
2 failed
12 are unknown (may either be lost or still in progress)
The 110 valid tasks returned a total of 11 million events
ID: 48403 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48404 - Posted: 9 Aug 2023, 17:37:41 UTC - in response to Message 48402.  

Sorry, we have no Cray or Summit to do Sherpa.

SIGUSR1, SIGUSR1, SIGUSR1.... more than 3 day runtime for nothing today.
ID: 48404 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,008,579
RAC: 136,094
Message 48405 - Posted: 9 Aug 2023, 17:41:01 UTC

Calm down.
The typical failure rate per (good) computer is between 1-5 %.
Your computers show 2 % failure rate.
This means 98 % valid results.
ID: 48405 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48411 - Posted: 10 Aug 2023, 3:37:05 UTC - in response to Message 48405.  

Calm down.

This invalids are because of watching a lot of time.

Virtualboxmanager looking and restart Tasks with SIGUSR1.
Most of this restarts are failing when restarted the third time.

Something is for Windows Theory Tasks not so as for Linux.
Native Linux have no problem in CentOS7-VM.
When have time, will looking WSL2 with Oracle Linux 9.1.

WCG showing, what is possible.
ID: 48411 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48443 - Posted: 14 Aug 2023, 7:12:54 UTC

Theory_2390-1099814-526_0

Laufzeit 1 Tage 4 Stunden 4 min. 37 sek.
CPU Zeit 1 Tage 3 Stunden 49 min. 38 sek.


===> [runRivet] Sun Aug 13 03:00:48 UTC 2023 [boinc ppbar ue 1800 15 - sherpa 1.4.0 default 100000 526]

Event 31400 ( 9m 57s elapsed / 21m 44s left ) -> ETA: Sun Aug 13 03:42
31400 events processed

Seeing this TASK so often!!
Stopped working always at th same Event.
ID: 48443 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,008,579
RAC: 136,094
Message 48444 - Posted: 14 Aug 2023, 8:13:17 UTC - in response to Message 48443.  

run 	events 	attempts 	success 	failure 	unknown
ppbar ue 1800 15 - sherpa 1.4.0 default 	11400000 	126 	114 	3	9

Has been sent to BOINC clients 126 times.
114 of them succeeded.
Success rate: >90 %

So far only 3 failed.
ID: 48444 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48446 - Posted: 14 Aug 2023, 8:57:13 UTC

Theory_2390-1099214-526_0
Laufzeit 22 Stunden 58 min. 46 sek.
CPU Zeit 22 Stunden 45 min. 47 sek.


#date_d ngood nbad total
2023-08-13 1130 32 1162 1. AMD Ryzen Threadripper PRO 3995WX 64-Cores [Family 23 Model 49 Stepping 0]

2023-08-13 902 22 924 2. AMD Ryzen Threadripper PRO 3995WX 64-Cores [Family 23 Model 49 Stepping 0]

1K Theory Tasks running ok per day,
only Sherpa is the longrunning with doing nothing after some time of running?
ID: 48446 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48452 - Posted: 16 Aug 2023, 5:29:38 UTC

Theory_2390-1131819-528_0
Arbeitspaket 214193875
Laufzeit 7 Stunden 50 min. 53 sek.
CPU Zeit 7 Stunden 46 min. 31 sek.

Theory_2390-1120389-520_2
Arbeitspaket 213708523
Laufzeit 1 Tage 10 Stunden 10 min. 45 sek.
CPU Zeit 1 Tage 9 Stunden 56 min. 19 sek.
ID: 48452 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,184,186
RAC: 104,580
Message 48514 - Posted: 8 Sep 2023, 5:22:15 UTC - in response to Message 48452.  

ID: 48514 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,008,579
RAC: 136,094
Message 48515 - Posted: 8 Sep 2023, 6:04:07 UTC - in response to Message 48514.  

From mcplots:
run 	events 	attempts 	success 	failure 	unknown
pp jets 7000 20,-,310 - sherpa 1.4.0 default 	11800000 	129 	118 	2	9

118 successful attempts out of 129 attempts sent out.
=> 91.5 % success rate.
There's no reason not to send them out.
ID: 48515 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : Theory Sherpa


©2024 CERN