Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Jim1348

Send message
Joined: 15 Nov 14
Posts: 394
Credit: 11,781,955
RAC: 5,102
Message 40800 - Posted: 5 Dec 2019, 17:50:43 UTC - in response to Message 40792.  

Would it be worth keeping the Theory-Native sub-project running, but dedicated to Sherpa tasks? That would provide an easy way for volunteers to opt-in/opt-out of the babysitting... (and the machinery's already there).

I am wondering how they handle these in-house. Do they let them go forever? Probably not. In fact, I am wondering whether they do them at all. Maybe they just throw them over the wall to us. (Though not to me. I am out of Theory.)
ID: 40800 · Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 7 May 08
Posts: 39
Credit: 218,356
RAC: 2
Message 40802 - Posted: 5 Dec 2019, 21:15:32 UTC - in response to Message 40789.  
Last modified: 5 Dec 2019, 21:16:33 UTC

[quote]I want to finish this one
67% after 67hs


And, at the end, error after 3 days... :-(
ID: 40802 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 334
Credit: 237,918
RAC: 0
Message 40808 - Posted: 6 Dec 2019, 9:03:40 UTC - in response to Message 40800.  

I am wondering how they handle these in-house. Do they let them go forever? Probably not. In fact, I am wondering whether they do them at all. Maybe they just throw them over the wall to us. (Though not to me. I am out of Theory.)

In our internal batch system we have Job Flavours, job limits that are up to 1 week. Failures can be as scientifically valuable as success. See the Michelson–Morley experiment.
ID: 40808 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 9
Message 40810 - Posted: 6 Dec 2019, 9:23:19 UTC

This one was a success:
pp jets 7000 300 - sherpa 1.4.2 default 41000 190]

https://lhcathome.cern.ch/lhcathome/result.php?resultid=253954301
80,534.88s
ID: 40810 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 394
Credit: 11,781,955
RAC: 5,102
Message 40821 - Posted: 6 Dec 2019, 20:57:44 UTC - in response to Message 40808.  

Failures can be as scientifically valuable as success. See the Michelson–Morley experiment.

Certainly, as long as they are testing the science and not just the instability of the Sherpa code. As a user, I have to rely on their judgement. Some versions seem better than others.
ID: 40821 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1350
Credit: 67,675,750
RAC: 96,837
Message 40845 - Posted: 8 Dec 2019, 13:03:22 UTC

A faster computer may be necessary to finish this task within the limits:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=253350398
===> [runRivet] Thu Nov 28 11:32:01 UTC 2019 [boinc ee zhad 91.2 - - sherpa 2.2.5 default 2000 188]
.
.
.
integration time:  ( 7d 3h 9m 57s elapsed / 1943d 17h 16m 49s left ) [12:55:34]   
4.23518e+18 pb +- ( 7.00423e+17 pb = 16.5382 % ) 2459080000 ( 2459080579 -> 99.9 % )
integration time:  ( 7d 3h 10m 2s elapsed / 1943d 17h 42m 38s left ) [12:55:44]   
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.262584,0.220149,0.939459)
  b    = (0,0,1)
  a'   = (0.083949,0.22735,0.970188) -> rel. dev. (inf,inf,-0.0298121)
  m_ct = 0.939459
  m_st = -0.342661
  m_n  = (0,1.27743e-06,-2.99348e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.262584,0.220149,0.939459)
  b    = (0,0,1)
  a'   = (0.083949,0.22735,0.970188) -> rel. dev. (inf,inf,-0.0298121)
  m_ct = 0.939459
  m_st = -0.342661
  m_n  = (0,1.27743e-06,-2.99348e-07)
}
ID: 40845 · Report as offensive     Reply Quote
Peter Skands

Send message
Joined: 31 Jan 11
Posts: 7
Credit: 3,509,115
RAC: 1,516
Message 40869 - Posted: 9 Dec 2019, 11:14:58 UTC - in response to Message 40792.  

Hi Henry,

That's an interesting proposal. Sequestering any jobs that are judged as being not completely stable (for whatever reason) in a dedicated queue that people can opt into, while the main queue would be reserved for more streamlined production runs.

I think that the actual name "Theory-Native" would not be good to retain for the 'development' sub-project. It would be a mis-badging, that would eventually confuse people. But I'm curious to ask other LHC@home developers if it would have merit and be possible to set up something like a "Theory-Beta" sub-project, where we could put those tasks that are problematic.

As I've written about elsewhere, Sherpa is a complex code, with advanced capabilities, and unfortunately also sometimes advanced ways of failing. Well, an infinite loop is not a particularly advanced failure of course, but in its defence typically the things it is doing when it enters those loops you see are things that the other codes are not even trying to do. Moreover, since none of the Test4Theory/LHC@home developers are Sherpa authors, all we can usually do is the same as ordinary users: submit a bug report to the Sherpa authors about what you guys are seeing, and hope that eventually in a future version, some of those loops will get fixed. Meanwhile, we still want to run the existing version of the code, since it is interesting to compare it, for those cases where it succeeds, to the available data. On our side, we can try to find ways of running it as "correctly" as possible, and use tricks to abort jobs that we can somehow detect as loopers, but despite some improvements on that, there clearly are still issues remaining.

I'd welcome feedback from other Test4Theory/LHC@home developers (and volunteers) to comment on this idea, and whether we would be able to implement it?

All the best,
Peter.
ID: 40869 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1350
Credit: 67,675,750
RAC: 96,837
Message 40884 - Posted: 9 Dec 2019, 22:42:00 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=253690832
===> [runRivet] Mon Dec  2 12:15:40 UTC 2019 [boinc pp jets 7000 80,-,960 - sherpa 1.4.3 default 100000 190]
.
.
.
1000 events processed
dumping histograms...
  Event 1100 ( 1m 33s elapsed / 2h 20m 38s left ) -> ETA: Mon Dec 02 16:32  
1100 events processed
Updating display...
  Event 1200 ( 1m 45s elapsed / 2h 24m 17s left ) -> ETA: Mon Dec 02 16:36  
1200 events processed
Display update finished (9 histograms, 1000 events).
  Event 1300 ( 1m 55s elapsed / 2h 26m 2s left ) -> ETA: Mon Dec 02 16:38  
1300 events processed
  Event 1400 ( 2m 3s elapsed / 2h 24m 52s left ) -> ETA: Mon Dec 02 16:37  
1400 events processed
Updating display...
Display update finished (9 histograms, 1000 events).
Updating display...
Display update finished (9 histograms, 1000 events).
.
.
.


But a week later:
.
.
.
Display update finished (9 histograms, 1000 events).
Updating display...
Display update finished (9 histograms, 1000 events).
Updating display...
Display update finished (9 histograms, 1000 events).
ID: 40884 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 881
Credit: 32,594,782
RAC: 45,785
Message 40885 - Posted: 10 Dec 2019, 0:33:59 UTC
Last modified: 10 Dec 2019, 0:34:26 UTC

McPlots shows: sherpa 1.4.3 default
pp jets 7000 80,-,960 - 0+13/68.
ID: 40885 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 9
Message 40887 - Posted: 10 Dec 2019, 9:33:54 UTC - in response to Message 40884.  

https://lhcathome.cern.ch/lhcathome/result.php?resultid=253690832
===> [runRivet] Mon Dec  2 12:15:40 UTC 2019 [boinc pp jets 7000 80,-,960 - sherpa 1.4.3 default 100000 190]


But a week later:
.
.
.
Display update finished (9 histograms, 1000 events).
Updating display...
Display update finished (9 histograms, 1000 events).
Updating display...
Display update finished (9 histograms, 1000 events).

Yeah, it looks like the same behaviour of
pp jets 7000 40,-,760 - sherpa 1.4.5 default 100000 190
mentioned above that was stuck at
Updating display...
Display update finished (9 histograms, 75000 events).
ID: 40887 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 881
Credit: 32,594,782
RAC: 45,785
Message 40920 - Posted: 12 Dec 2019, 13:52:40 UTC

Time limit reached in db yesterday, but still running 11 days and 18 hours, with no end.
Get a shut down now. [boinc ee zhad 43.6 - - sherpa 2.2.5 default 2000 189]
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=127302369
ID: 40920 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 9
Message 40936 - Posted: 13 Dec 2019, 11:36:24 UTC - in response to Message 40920.  
Last modified: 13 Dec 2019, 11:43:15 UTC

Time limit reached in db yesterday, but still running 11 days and 18 hours, with no end.
Get a shut down now. [boinc ee zhad 43.6 - - sherpa 2.2.5 default 2000 189]
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=127302369
How do you search for these tasks (successful ones)? I would like to know if it's useful I check all the active hosts (~3600) by a PHP script.

Edit: Ok, that was one of your tasks. Do you post only your tasks?
ID: 40936 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 893
Credit: 6,241,606
RAC: 1,676
Message 40939 - Posted: 13 Dec 2019, 13:11:08 UTC - in response to Message 40936.  

How do you search for these tasks (successful ones)? I would like to know if it's useful I check all the active hosts (~3600) by a PHP script.
Here you can find all tasks of the current 2279-batch:

Sherpa runs of batch 2279: # of events, attempts. success, fail and lost

Attaching the link to MC Production mentioned there has a long load time.
ID: 40939 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 9
Message 40950 - Posted: 14 Dec 2019, 1:30:12 UTC - in response to Message 40939.  

I tried to scan 100 hosts, as a test.

MAX_RUNTIME=86400

Long runner condition
$exit_status == 0 and $runtime > $MAX_RUNTIME and ( $plan_class == "native_theory" or ( $plan_class == "vbox64_theory" and $version >= 300.00 ) )

8 sherpa long runners were found.

(resultid, wuid, userid, hostid, run, sent_timestamp, report_deadline_timestamp, runtime, cputime, application_name, plan_class, version)
254955577	128154506	67	10509223	pp jets 7000 400 - sherpa 1.2.2p default 47000 192	1575957398	1576257282	186377.97	184224.6	Theory Simulation	vbox64_theory	300.02
254725396	128029652	67	10509223	pp jets 13000 150,-,2360 - pythia8 8.235 tune-AU2lox 90000 192	1575820771	1575937324	116385.78	115760.2	Theory Simulation	vbox64_theory	300.02
254410071	127857452	67	10509223	pp jets 13000 180,-,3560 - pythia8 8.235 tune-AU2ct10 65000 191	1575629833	1575767665	132898.3	131505.9	Theory Simulation	vbox64_theory	300.02
254740811	128040944	248	10624524	pp jets 7000 300 - sherpa 1.4.2 default 47000 192	1575852462	1576021366	153905.26	114838.7	Theory Simulation	vbox64_theory	300.02
254731110	128033833	248	10624524	pp jets 7000 150 - sherpa 2.1.0 default 71000 192	1575831679	1576021366	179895.46	134835.3	Theory Simulation	vbox64_theory	300.02
254387019	127843745	248	10624524	pp jets 8000 800 - pythia8 8.235 cr1 100000 191	1575587355	1575714872	99413.44	74366.95	Theory Simulation	vbox64_theory	300.02
254260169	127778796	248	10624524	pp jets 7000 80,-,1160 - pythia8 8.235 early 81000 191	1575556381	1575653145	88837.18	66426.13	Theory Simulation	vbox64_theory	300.02
254731795	128034287	248	10624548	pp jets 13000 250,-,4760 - pythia8 8.235 tune-AU2lox 85000 192	1575832505	1575924383	88728.4	66311.83	Theory Simulation	vbox64_theory	300.02
254536231	127892806	248	10624548	pp jets 8000 600 - pythia8 8.235 early 69000 191	1575638647	1575750844	108514.62	81099.94	Theory Simulation	vbox64_theory	300.02
254384925	127842473	248	10624548	pp jets 8000 800 - pythia8 8.235 tune-AU2m 85000 191	1575582420	1575686538	89614.75	66915	Theory Simulation	vbox64_theory	300.02
254126655	127707205	1013	10408677	pp jets 7000 300 - sherpa 1.4.5 default 41000 190	1575483946	1575580180	93937.45	83361.72	Theory Simulation	vbox64_theory	300.02
254390931	127846456	1525	10546691	pp z1j 7000 100 - pythia8 8.235 tune-1 100000 191	1575596518	1575805828	112213.8	110974.1	Theory Simulation	vbox64_theory	300.02
253592355	127427973	1731	10309088	pp jets 7000 80,-,1160 - pythia8 8.235 default-MBR 100000 190	1575251124	1575943567	188413.47	162158.1	Theory Simulation	vbox64_theory	300.02
255062615	128210948	2139	10372172		1576013512	1576256496	105158.79	102382.5	Theory Simulation	vbox64_theory	300.02
254395033	127848909	2139	10372172		1575601855	1576031520	141318.19	138428.9	Theory Simulation	vbox64_theory	300.02
254745497	128044231	3210	9936309	pp jets 8000 800 - sherpa 1.4.2 default 17000 192	1575861233	1576114071	167609.51	154827	Theory Simulation	vbox64_theory	300.02
255216964	128293991	3689	10580251	pp jets 7000 28 - pythia8 8.235 early 90000 193	1576132906	1576280280	108959.74	105950.3	Theory Simulation	vbox64_theory	300.02
254990460	128172326	3775	10525345	pp jets 7000 65 - pythia8 8.235 cr1 81000 192	1575967981	1576070498	93187.88	84652.28	Theory Simulation	vbox64_theory	300.02
254602898	127959283	3855	10463494	pp jets 7000 400 - sherpa 2.1.1 default 42000 192	1575702265	1576057301	349029.78	344602.6	Theory Simulation	vbox64_theory	300.02
254760248	128052626	3855	10605580	pp jets 7000 250 - sherpa 1.4.0 default 58000 192	1575887266	1576160237	248303.5	235340.9	Theory Simulation	vbox64_theory	300.02
253370499	127299459	3949	10511876		1574997154	1575686688	185962.38	139855.2	Theory Simulation	vbox64_theory	300.02
254902362	128126598	4837	10391134	pp jets 7000 150,-,2360 - pythia8 8.235 tune-AZ 100000 192	1575930129	1576034304	91894.86	85350.45	Theory Simulation	vbox64_theory	300.02
254741279	128041310	4837	10391134	pp jets 7000 600 - sherpa 2.1.1 default 17000 192	1575853233	1576233183	362646.35	341416.3	Theory Simulation	vbox64_theory	300.02
254630592	127977466	4837	10391134	pp zinclusive 7000 -,-,50,130 - pythia8 8.235 tune-AU2 100000 192	1575751776	1575862778	96628.02	93324.64	Theory Simulation	vbox64_theory	300.02
254625034	127973791	4837	10391134	pp jets 13000 180,-,3560 - pythia8 8.235 tune-AU2ct10 62000 192	1575740078	1575930110	173697.48	167787.8	Theory Simulation	vbox64_theory	300.02
253420913	127163759	5326	10504105	pp jets 7000 80,-,960 - pythia8 8.209 default-CD 100000 188	1575039288	1576016739	296105.53	283197.5	Theory Simulation	vbox64_theory	300.02
253422242	126977012	5326	10504105	pp z1j 7000 100 - pythia8 8.180 default 100000 188	1575039288	1576089048	159666.67	148856	Theory Simulation	vbox64_theory	300.02
254139932	127714391	5729	10525461	pp jets 13000 150,-,1860 - pythia8 8.235 tune-AU2lox 85000 190	1575492171	1575667360	136217.81	127170.5	Theory Simulation	vbox64_theory	300.02
255074042	128217424	5729	10576889	pp jets 8000 180,-,3560 - herwig7 7.1.0 softTune 100000 192	1576027864	1576269124	118828.47	117313	Theory Simulation	vbox64_theory	300.02
254577714	127943174	5729	10576889	pp jets 13000 150,-,2360 - pythia8 8.235 tune-4cx 100000 192	1575656145	1576067031	164301.9	163863.5	Theory Simulation	vbox64_theory	300.02
253941568	127611080	5729	10576889	pp jets 13000 180,-,3560 - pythia8 8.235 tune-4c 100000 190	1575414568	1575606991	110049.36	109642.3	Theory Simulation	vbox64_theory	300.02
254576122	127942035	6039	10555092	pp jets 13000 150,-,1860 - pythia8 8.235 early 72000 191	1575650268	1576038491	125044.98	93737.42	Theory Simulation	vbox64_theory	300.02


Script log
MODE: SEARCH FOR LONG RUNNERS
USERID: 31; HOSTID: 10481542; OFFSET: 0
USERID: 31; HOSTID: 10481542; OFFSET: 20
USERID: 31; HOSTID: 10481542; OFFSET: 40
USERID: 31; HOSTID: 10481542; OFFSET: 60
USERID: 56; HOSTID: 10623869; OFFSET: 0
USERID: 56; HOSTID: 10623869; OFFSET: 20
USERID: 67; HOSTID: 10509223; OFFSET: 0
USERID: 67; HOSTID: 10509223; OFFSET: 20
USERID: 67; HOSTID: 10509223; OFFSET: 40
Task 254955577 is (sherpa) long runner!
USERID: 67; HOSTID: 10509223; OFFSET: 60
USERID: 67; HOSTID: 10509223; OFFSET: 80
USERID: 67; HOSTID: 10509223; OFFSET: 100
Task 254725396 is (sherpa) long runner!
USERID: 67; HOSTID: 10509223; OFFSET: 120
USERID: 67; HOSTID: 10509223; OFFSET: 140
USERID: 67; HOSTID: 10509223; OFFSET: 160
Task 254410071 is (sherpa) long runner!
USERID: 67; HOSTID: 10509223; OFFSET: 180
USERID: 67; HOSTID: 10509223; OFFSET: 200
USERID: 248; HOSTID: 10622647; OFFSET: 0
USERID: 248; HOSTID: 10622647; OFFSET: 20
USERID: 248; HOSTID: 10623274; OFFSET: 0
USERID: 248; HOSTID: 10623646; OFFSET: 0
USERID: 248; HOSTID: 10623669; OFFSET: 0
USERID: 248; HOSTID: 10624337; OFFSET: 0
USERID: 248; HOSTID: 10624354; OFFSET: 0
USERID: 248; HOSTID: 10624524; OFFSET: 0
USERID: 248; HOSTID: 10624524; OFFSET: 20
USERID: 248; HOSTID: 10624524; OFFSET: 40
Task 254740811 is (sherpa) long runner!
Task 254731110 is (sherpa) long runner!
USERID: 248; HOSTID: 10624524; OFFSET: 60
USERID: 248; HOSTID: 10624524; OFFSET: 80
Task 254387019 is (sherpa) long runner!
USERID: 248; HOSTID: 10624524; OFFSET: 100
Task 254260169 is (sherpa) long runner!
USERID: 248; HOSTID: 10624524; OFFSET: 120
USERID: 248; HOSTID: 10624548; OFFSET: 0
USERID: 248; HOSTID: 10624548; OFFSET: 20
USERID: 248; HOSTID: 10624548; OFFSET: 40
USERID: 248; HOSTID: 10624548; OFFSET: 60
Task 254731795 is (sherpa) long runner!
USERID: 248; HOSTID: 10624548; OFFSET: 80
USERID: 248; HOSTID: 10624548; OFFSET: 100
Task 254536231 is (sherpa) long runner!
USERID: 248; HOSTID: 10624548; OFFSET: 120
Task 254384925 is (sherpa) long runner!
USERID: 248; HOSTID: 10624548; OFFSET: 140
USERID: 248; HOSTID: 10626076; OFFSET: 0
USERID: 248; HOSTID: 10626076; OFFSET: 20
USERID: 248; HOSTID: 10626076; OFFSET: 40
USERID: 248; HOSTID: 10626076; OFFSET: 60
USERID: 248; HOSTID: 10626076; OFFSET: 80
USERID: 387; HOSTID: 10411018; OFFSET: 0
USERID: 387; HOSTID: 10411018; OFFSET: 20
USERID: 417; HOSTID: 10138801; OFFSET: 0
USERID: 417; HOSTID: 10138801; OFFSET: 20
USERID: 417; HOSTID: 10138801; OFFSET: 40
USERID: 417; HOSTID: 10138801; OFFSET: 60
USERID: 417; HOSTID: 10389967; OFFSET: 0
USERID: 417; HOSTID: 10389967; OFFSET: 20
USERID: 417; HOSTID: 10389967; OFFSET: 40
USERID: 462; HOSTID: 10625542; OFFSET: 0
USERID: 462; HOSTID: 10625542; OFFSET: 20
USERID: 462; HOSTID: 10625542; OFFSET: 40
USERID: 723; HOSTID: 10474310; OFFSET: 0
USERID: 723; HOSTID: 10474310; OFFSET: 20
USERID: 807; HOSTID: 10487134; OFFSET: 0
USERID: 829; HOSTID: 10346050; OFFSET: 0
USERID: 829; HOSTID: 10346050; OFFSET: 20
USERID: 1013; HOSTID: 10408677; OFFSET: 0
USERID: 1013; HOSTID: 10408677; OFFSET: 20
USERID: 1013; HOSTID: 10408677; OFFSET: 40
USERID: 1013; HOSTID: 10408677; OFFSET: 60
USERID: 1013; HOSTID: 10408677; OFFSET: 80
USERID: 1013; HOSTID: 10408677; OFFSET: 100
USERID: 1013; HOSTID: 10408677; OFFSET: 120
USERID: 1013; HOSTID: 10408677; OFFSET: 140
USERID: 1013; HOSTID: 10408677; OFFSET: 160
USERID: 1013; HOSTID: 10408677; OFFSET: 180
Task 254126655 is (sherpa) long runner!
USERID: 1013; HOSTID: 10408677; OFFSET: 200
USERID: 1013; HOSTID: 10408728; OFFSET: 0
USERID: 1013; HOSTID: 10408728; OFFSET: 20
USERID: 1013; HOSTID: 10408728; OFFSET: 40
USERID: 1013; HOSTID: 10408728; OFFSET: 60
USERID: 1013; HOSTID: 10408728; OFFSET: 80
USERID: 1013; HOSTID: 10408728; OFFSET: 100
USERID: 1013; HOSTID: 10408728; OFFSET: 120
USERID: 1095; HOSTID: 10452531; OFFSET: 0
USERID: 1095; HOSTID: 10452531; OFFSET: 20
USERID: 1211; HOSTID: 10569580; OFFSET: 0
USERID: 1211; HOSTID: 10569580; OFFSET: 20
USERID: 1391; HOSTID: 10570005; OFFSET: 0
USERID: 1400; HOSTID: 10379522; OFFSET: 0
USERID: 1452; HOSTID: 10617473; OFFSET: 0
USERID: 1452; HOSTID: 10617473; OFFSET: 20
USERID: 1499; HOSTID: 10605693; OFFSET: 0
USERID: 1525; HOSTID: 10546691; OFFSET: 0
USERID: 1525; HOSTID: 10546691; OFFSET: 20
Task 254390931 is (sherpa) long runner!
USERID: 1525; HOSTID: 10546691; OFFSET: 40
USERID: 1731; HOSTID: 10309088; OFFSET: 0
Task 253592355 is (sherpa) long runner!
USERID: 1731; HOSTID: 10309088; OFFSET: 20
USERID: 1911; HOSTID: 10363794; OFFSET: 0
USERID: 1981; HOSTID: 10588410; OFFSET: 0
USERID: 1981; HOSTID: 10588410; OFFSET: 20
USERID: 2055; HOSTID: 10374085; OFFSET: 0
USERID: 2055; HOSTID: 10374085; OFFSET: 20
USERID: 2060; HOSTID: 10596199; OFFSET: 0
USERID: 2060; HOSTID: 10596199; OFFSET: 20
USERID: 2060; HOSTID: 10596199; OFFSET: 40
USERID: 2060; HOSTID: 10596202; OFFSET: 0
USERID: 2060; HOSTID: 10596202; OFFSET: 20
USERID: 2060; HOSTID: 10596213; OFFSET: 0
USERID: 2060; HOSTID: 10596213; OFFSET: 20
USERID: 2060; HOSTID: 10596213; OFFSET: 40
USERID: 2060; HOSTID: 10596213; OFFSET: 60
USERID: 2060; HOSTID: 10596213; OFFSET: 80
USERID: 2060; HOSTID: 10596218; OFFSET: 0
USERID: 2060; HOSTID: 10596218; OFFSET: 20
USERID: 2139; HOSTID: 10372172; OFFSET: 0
Task 255062615 is (sherpa) long runner!
Task 254395033 is (sherpa) long runner!
USERID: 2139; HOSTID: 10372172; OFFSET: 20
USERID: 2291; HOSTID: 10364695; OFFSET: 0
USERID: 2291; HOSTID: 10364695; OFFSET: 20
USERID: 2322; HOSTID: 10547004; OFFSET: 0
USERID: 2322; HOSTID: 10547004; OFFSET: 20
USERID: 2322; HOSTID: 10547004; OFFSET: 40
USERID: 2361; HOSTID: 10312023; OFFSET: 0
USERID: 2361; HOSTID: 10312023; OFFSET: 20
USERID: 2361; HOSTID: 10312023; OFFSET: 40
USERID: 2438; HOSTID: 10409807; OFFSET: 0
USERID: 2537; HOSTID: 10485944; OFFSET: 0
USERID: 2537; HOSTID: 10485944; OFFSET: 20
USERID: 2537; HOSTID: 10600848; OFFSET: 0
USERID: 2739; HOSTID: 10294511; OFFSET: 0
USERID: 2739; HOSTID: 10294511; OFFSET: 20
USERID: 2739; HOSTID: 10294511; OFFSET: 40
USERID: 2739; HOSTID: 10294511; OFFSET: 60
USERID: 2739; HOSTID: 10294511; OFFSET: 80
USERID: 2739; HOSTID: 10294511; OFFSET: 100
USERID: 2739; HOSTID: 10294511; OFFSET: 120
USERID: 2739; HOSTID: 10294511; OFFSET: 140
USERID: 2739; HOSTID: 10509390; OFFSET: 0
USERID: 2739; HOSTID: 10509390; OFFSET: 20
USERID: 2739; HOSTID: 10509390; OFFSET: 40
USERID: 3016; HOSTID: 10298563; OFFSET: 0
USERID: 3016; HOSTID: 10298563; OFFSET: 20
USERID: 3109; HOSTID: 10458123; OFFSET: 0
USERID: 3109; HOSTID: 10458123; OFFSET: 20
USERID: 3137; HOSTID: 10363404; OFFSET: 0
USERID: 3137; HOSTID: 10363404; OFFSET: 20
USERID: 3156; HOSTID: 10363583; OFFSET: 0
USERID: 3156; HOSTID: 10363583; OFFSET: 20
USERID: 3168; HOSTID: 10565243; OFFSET: 0
USERID: 3168; HOSTID: 10565243; OFFSET: 20
USERID: 3168; HOSTID: 10565243; OFFSET: 40
USERID: 3168; HOSTID: 10565243; OFFSET: 60
USERID: 3168; HOSTID: 10565243; OFFSET: 80
USERID: 3168; HOSTID: 10566007; OFFSET: 0
USERID: 3168; HOSTID: 10566007; OFFSET: 20
USERID: 3168; HOSTID: 10566007; OFFSET: 40
USERID: 3210; HOSTID: 9936309; OFFSET: 0
USERID: 3210; HOSTID: 9936309; OFFSET: 20
USERID: 3210; HOSTID: 9936309; OFFSET: 40
Task 254745497 is (sherpa) long runner!
USERID: 3210; HOSTID: 9936309; OFFSET: 60
USERID: 3210; HOSTID: 9936309; OFFSET: 80
USERID: 3210; HOSTID: 9936309; OFFSET: 100
USERID: 3210; HOSTID: 9936309; OFFSET: 120
USERID: 3239; HOSTID: 10395886; OFFSET: 0
USERID: 3239; HOSTID: 10395886; OFFSET: 20
USERID: 3277; HOSTID: 10589437; OFFSET: 0
USERID: 3284; HOSTID: 10622682; OFFSET: 0
USERID: 3284; HOSTID: 10622682; OFFSET: 20
USERID: 3285; HOSTID: 10620488; OFFSET: 0
USERID: 3285; HOSTID: 10620488; OFFSET: 20
USERID: 3285; HOSTID: 10620488; OFFSET: 40
USERID: 3328; HOSTID: 9848732; OFFSET: 0
USERID: 3328; HOSTID: 9848732; OFFSET: 20
USERID: 3353; HOSTID: 10617180; OFFSET: 0
USERID: 3353; HOSTID: 10617180; OFFSET: 20
USERID: 3689; HOSTID: 10580251; OFFSET: 0
Task 255216964 is (sherpa) long runner!
USERID: 3689; HOSTID: 10580251; OFFSET: 20
USERID: 3689; HOSTID: 10580251; OFFSET: 40
USERID: 3689; HOSTID: 10580251; OFFSET: 60
USERID: 3775; HOSTID: 10370707; OFFSET: 0
USERID: 3775; HOSTID: 10525345; OFFSET: 0
USERID: 3775; HOSTID: 10525345; OFFSET: 20
Task 254990460 is (sherpa) long runner!
USERID: 3775; HOSTID: 10525345; OFFSET: 40
USERID: 3775; HOSTID: 10525345; OFFSET: 60
USERID: 3775; HOSTID: 10525345; OFFSET: 80
USERID: 3775; HOSTID: 10525345; OFFSET: 100
USERID: 3775; HOSTID: 10525345; OFFSET: 120
USERID: 3855; HOSTID: 10457508; OFFSET: 0
USERID: 3855; HOSTID: 10457508; OFFSET: 20
USERID: 3855; HOSTID: 10457508; OFFSET: 40
USERID: 3855; HOSTID: 10457508; OFFSET: 60
USERID: 3855; HOSTID: 10457508; OFFSET: 80
USERID: 3855; HOSTID: 10457508; OFFSET: 100
USERID: 3855; HOSTID: 10457508; OFFSET: 120
USERID: 3855; HOSTID: 10457508; OFFSET: 140
USERID: 3855; HOSTID: 10457508; OFFSET: 160
USERID: 3855; HOSTID: 10463494; OFFSET: 0
USERID: 3855; HOSTID: 10463494; OFFSET: 20
USERID: 3855; HOSTID: 10463494; OFFSET: 40
Task 254602898 is (sherpa) long runner!
USERID: 3855; HOSTID: 10463494; OFFSET: 60
USERID: 3855; HOSTID: 10463494; OFFSET: 80
USERID: 3855; HOSTID: 10605580; OFFSET: 0
Task 254760248 is (sherpa) long runner!
USERID: 3855; HOSTID: 10605580; OFFSET: 20
USERID: 3855; HOSTID: 10605580; OFFSET: 40
USERID: 3949; HOSTID: 9889908; OFFSET: 0
USERID: 3949; HOSTID: 9889908; OFFSET: 20
USERID: 3949; HOSTID: 9889908; OFFSET: 40
USERID: 3949; HOSTID: 10511876; OFFSET: 0
Task 253370499 is (sherpa) long runner!
USERID: 3949; HOSTID: 10511876; OFFSET: 20
USERID: 3949; HOSTID: 10557139; OFFSET: 0
USERID: 3949; HOSTID: 10580769; OFFSET: 0
USERID: 3949; HOSTID: 10580769; OFFSET: 20
USERID: 3949; HOSTID: 10580769; OFFSET: 40
USERID: 3949; HOSTID: 10580769; OFFSET: 60
USERID: 3949; HOSTID: 10580769; OFFSET: 80
USERID: 4013; HOSTID: 10557791; OFFSET: 0
USERID: 4023; HOSTID: 10346523; OFFSET: 0
USERID: 4023; HOSTID: 10346523; OFFSET: 20
USERID: 4197; HOSTID: 10587886; OFFSET: 0
USERID: 4219; HOSTID: 19063; OFFSET: 0
USERID: 4219; HOSTID: 9763811; OFFSET: 0
USERID: 4219; HOSTID: 9763811; OFFSET: 20
USERID: 4345; HOSTID: 10486325; OFFSET: 0
USERID: 4345; HOSTID: 10486325; OFFSET: 20
USERID: 4465; HOSTID: 10619631; OFFSET: 0
USERID: 4465; HOSTID: 10619631; OFFSET: 20
USERID: 4716; HOSTID: 10581411; OFFSET: 0
USERID: 4716; HOSTID: 10581411; OFFSET: 20
USERID: 4716; HOSTID: 10615347; OFFSET: 0
USERID: 4837; HOSTID: 10391134; OFFSET: 0
USERID: 4837; HOSTID: 10391134; OFFSET: 20
Task 254902362 is (sherpa) long runner!
Task 254741279 is (sherpa) long runner!
USERID: 4837; HOSTID: 10391134; OFFSET: 40
Task 254630592 is (sherpa) long runner!
Task 254625034 is (sherpa) long runner!
USERID: 4837; HOSTID: 10391134; OFFSET: 60
USERID: 5326; HOSTID: 10504105; OFFSET: 0
Task 253420913 is (sherpa) long runner!
Task 253422242 is (sherpa) long runner!
USERID: 5326; HOSTID: 10504105; OFFSET: 20
USERID: 5373; HOSTID: 10583586; OFFSET: 0
USERID: 5472; HOSTID: 10447575; OFFSET: 0
USERID: 5472; HOSTID: 10447575; OFFSET: 20
USERID: 5472; HOSTID: 10451775; OFFSET: 0
USERID: 5472; HOSTID: 10451775; OFFSET: 20
USERID: 5670; HOSTID: 10322182; OFFSET: 0
USERID: 5670; HOSTID: 10322182; OFFSET: 20
USERID: 5670; HOSTID: 10322182; OFFSET: 40
USERID: 5670; HOSTID: 10322182; OFFSET: 60
USERID: 5670; HOSTID: 10322182; OFFSET: 80
USERID: 5729; HOSTID: 10522104; OFFSET: 0
USERID: 5729; HOSTID: 10522104; OFFSET: 20
USERID: 5729; HOSTID: 10522104; OFFSET: 40
USERID: 5729; HOSTID: 10522104; OFFSET: 60
USERID: 5729; HOSTID: 10522104; OFFSET: 80
USERID: 5729; HOSTID: 10522104; OFFSET: 100
USERID: 5729; HOSTID: 10522104; OFFSET: 120
USERID: 5729; HOSTID: 10522104; OFFSET: 140
USERID: 5729; HOSTID: 10522104; OFFSET: 160
USERID: 5729; HOSTID: 10522104; OFFSET: 180
USERID: 5729; HOSTID: 10522104; OFFSET: 200
USERID: 5729; HOSTID: 10522104; OFFSET: 220
USERID: 5729; HOSTID: 10522104; OFFSET: 240
USERID: 5729; HOSTID: 10522104; OFFSET: 260
USERID: 5729; HOSTID: 10522104; OFFSET: 280
USERID: 5729; HOSTID: 10522104; OFFSET: 300
USERID: 5729; HOSTID: 10525461; OFFSET: 0
USERID: 5729; HOSTID: 10525461; OFFSET: 20
USERID: 5729; HOSTID: 10525461; OFFSET: 40
Task 254139932 is (sherpa) long runner!
USERID: 5729; HOSTID: 10525461; OFFSET: 60
USERID: 5729; HOSTID: 10576889; OFFSET: 0
Task 255074042 is (sherpa) long runner!
USERID: 5729; HOSTID: 10576889; OFFSET: 20
USERID: 5729; HOSTID: 10576889; OFFSET: 40
Task 254577714 is (sherpa) long runner!
USERID: 5729; HOSTID: 10576889; OFFSET: 60
Task 253941568 is (sherpa) long runner!
USERID: 5729; HOSTID: 10576889; OFFSET: 80
USERID: 5905; HOSTID: 10406695; OFFSET: 0
USERID: 5905; HOSTID: 10406695; OFFSET: 20
USERID: 5914; HOSTID: 10486095; OFFSET: 0
USERID: 5914; HOSTID: 10486095; OFFSET: 20
USERID: 5914; HOSTID: 10530433; OFFSET: 0
USERID: 5941; HOSTID: 10620272; OFFSET: 0
USERID: 5943; HOSTID: 10588290; OFFSET: 0
USERID: 5943; HOSTID: 10588290; OFFSET: 20
USERID: 5943; HOSTID: 10603952; OFFSET: 0
USERID: 5943; HOSTID: 10603952; OFFSET: 20
USERID: 5943; HOSTID: 10603952; OFFSET: 40
USERID: 5943; HOSTID: 10603952; OFFSET: 60
USERID: 5943; HOSTID: 10603952; OFFSET: 80
USERID: 5943; HOSTID: 10603952; OFFSET: 100
USERID: 5943; HOSTID: 10613888; OFFSET: 0
USERID: 5943; HOSTID: 10613888; OFFSET: 20
USERID: 5943; HOSTID: 10613888; OFFSET: 40
USERID: 5943; HOSTID: 10613888; OFFSET: 60
USERID: 5943; HOSTID: 10616579; OFFSET: 0
USERID: 5943; HOSTID: 10616579; OFFSET: 20
USERID: 6039; HOSTID: 10555092; OFFSET: 0
USERID: 6039; HOSTID: 10555092; OFFSET: 20
USERID: 6039; HOSTID: 10555092; OFFSET: 40
USERID: 6039; HOSTID: 10555092; OFFSET: 60
USERID: 6039; HOSTID: 10555092; OFFSET: 80
USERID: 6039; HOSTID: 10555092; OFFSET: 100
Task 254576122 is (sherpa) long runner!
USERID: 6039; HOSTID: 10555092; OFFSET: 120
USERID: 6039; HOSTID: 10555092; OFFSET: 140
USERID: 6101; HOSTID: 10323545; OFFSET: 0
USERID: 6101; HOSTID: 10323545; OFFSET: 20
USERID: 6101; HOSTID: 10323545; OFFSET: 40
USERID: 6190; HOSTID: 10525261; OFFSET: 0
USERID: 6190; HOSTID: 10525261; OFFSET: 20
USERID: 6205; HOSTID: 10532965; OFFSET: 0
USERID: 6205; HOSTID: 10532965; OFFSET: 20
USERID: 6228; HOSTID: 10619860; OFFSET: 0
USERID: 6228; HOSTID: 10619860; OFFSET: 20
USERID: 6298; HOSTID: 10382093; OFFSET: 0
USERID: 6298; HOSTID: 10382093; OFFSET: 20
ID: 40950 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 881
Credit: 32,594,782
RAC: 45,785
Message 40961 - Posted: 14 Dec 2019, 22:37:28 UTC - in response to Message 40920.  

Time limit reached in db yesterday, but still running 11 days and 18 hours, with no end.
Get a shut down now. [boinc ee zhad 43.6 - - sherpa 2.2.5 default 2000 189]
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=127302369

This Sherpa was finished now from a other Volunteer in half a hour?
ID: 40961 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1350
Credit: 67,675,750
RAC: 96,837
Message 40962 - Posted: 15 Dec 2019, 9:45:23 UTC

time left >3543d and increasing.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=255020975
===> [runRivet] Tue Dec 10 11:18:18 UTC 2019 [boinc ee zhad 14 - - sherpa 2.2.0 default 1000 188]
.
.
.
6.97613e+18 pb +- ( 2.2631e+18 pb = 32.4406 % ) 2172520000 ( 2172540913 -> 99.9 % )
integration time:  ( 3d 8h 52m 24s elapsed / 3543d 7h 23m 28s left ) [09:35:07]   
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.14095,0.824337,-0.548271)
  b    = (0,0,1)
  a'   = (0.905231,-0.35381,0.235321) -> rel. dev. (inf,-inf,-0.764679)
  m_ct = -0.548271
  m_st = -0.836301
  m_n  = (0,-4.69413e-07,-7.05773e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.14095,0.824337,-0.548271)
  b    = (0,0,1)
  a'   = (0.905231,-0.35381,0.235321) -> rel. dev. (inf,-inf,-0.764679)
  m_ct = -0.548271
  m_st = -0.836301
  m_n  = (0,-4.69413e-07,-7.05773e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (0.4385,-0.489443,0.753766)
  b    = (0,0,1)
  a'   = (0.921121,-0.211997,0.326485) -> rel. dev. (inf,-inf,-0.673515)
  m_ct = 0.753766
  m_st = -0.657143
  m_n  = (-0,6.46517e-07,4.19804e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (0.4385,-0.489443,0.753766)
  b    = (0,0,1)
  a'   = (0.921121,-0.211997,0.326485) -> rel. dev. (inf,-inf,-0.673515)
  m_ct = 0.753766
  m_st = -0.657143
  m_n  = (-0,6.46517e-07,4.19804e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.733593,-0.18741,-0.653238)
  b    = (0,0,1)
  a'   = (0.993764,-0.0307506,-0.107184) -> rel. dev. (inf,-inf,-1.10718)
  m_ct = -0.653238
  m_st = -0.757153
  m_n  = (0,-3.56106e-07,1.02165e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.733593,-0.18741,-0.653238)
  b    = (0,0,1)
  a'   = (0.993764,-0.0307506,-0.107184) -> rel. dev. (inf,-inf,-1.10718)
  m_ct = -0.653238
  m_st = -0.757153
  m_n  = (0,-3.56106e-07,1.02165e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.0616526,0.214378,-0.974803)
  b    = (0,0,1)
  a'   = (0.282742,-0.206022,0.936809) -> rel. dev. (inf,-inf,-0.0631907)
  m_ct = -0.974803
  m_st = -0.223067
  m_n  = (0,-1.46214e-06,-3.21553e-07)
}
Poincare::Poincare(): Inaccurate rotation {
  a    = (-0.0616526,0.214378,-0.974803)
  b    = (0,0,1)
  a'   = (0.282742,-0.206022,0.936809) -> rel. dev. (inf,-inf,-0.0631907)
  m_ct = -0.974803
  m_st = -0.223067
  m_n  = (0,-1.46214e-06,-3.21553e-07)
}
6.97607e+18 pb +- ( 2.26308e+18 pb = 32.4406 % ) 2172540000 ( 2172560913 -> 99.9 % )
integration time:  ( 3d 8h 52m 27s elapsed / 3543d 8h 15m 20s left ) [09:35:12]
ID: 40962 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1350
Credit: 67,675,750
RAC: 96,837
Message 40963 - Posted: 15 Dec 2019, 10:01:34 UTC

Looks like it got stuck after 92100 events.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=255475321
===> [runRivet] Sat Dec 14 05:12:22 UTC 2019 [boinc pp jets 7000 65 - sherpa 1.4.1 default 100000 194]
.
.
.
  Event 92100 ( 10h 27m 32s elapsed / 53m 49s left ) -> ETA: Sat Dec 14 18:05  
92100 events processed
Updating display...
Poincare::Poincare(): Inaccurate rotation {
  a    = (0,0,1)
  b    = (0.23570226039552,0.94280904158206,0.23570226039552)
  a'   = (0.97182531580755,0,0.23570226039552) -> rel. dev. (3.1231056256177,-1,0)
  m_ct = 0.23570226039552
  m_st = -0.97182531580755
  m_n  = (0,1,0)
}
Display update finished (127 histograms, 92000 events).
Updating display...
Display update finished (127 histograms, 92000 events).
.
.
.
Updating display...
Display update finished (127 histograms, 92000 events).
ID: 40963 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,027,000
RAC: 9
Message 41010 - Posted: 18 Dec 2019, 18:39:01 UTC
Last modified: 18 Dec 2019, 18:39:31 UTC

Another successful long sherpa for me.
pp jets 8000 800 - sherpa 1.3.1 default 26000 194
2 days 5 hours 18 minutes 54 seconds
https://lhcathome.cern.ch/lhcathome/result.php?resultid=255556375
Is it useful to report them here? Or should I write about problematic/buggy tasks only?


Is it useful my list for someone? Do admins/project scientists save informations about sherpa runtime to a database?
ID: 41010 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 893
Credit: 6,241,606
RAC: 1,676
Message 41016 - Posted: 19 Dec 2019, 16:48:25 UTC
Last modified: 20 Dec 2019, 9:01:22 UTC

Shall I give this one a try?
===> [runRivet] Thu Dec 19 15:36:23 UTC 2019 [boinc pp jets 7000 150,-,2360 - sherpa 2.2.5 default 1000 195]
Of 137 attempts, 'only' 7 were successful to process 1000 events each, whereof 2 from the last 8 attemps.

That's interesting: So 5 of the first 129 attempts and 2 of the last 8.
Maybe something to do with the fact that the VM's are not longer killed after 18 hours wall clock run time.

Edit: It's a repair job. The previous try ended in an error after 2 days and 18 hours:
2019-12-19 11:31:20 (11120): Status Report: Elapsed Time: '222022.785223'
2019-12-19 11:31:20 (11120): Status Report: CPU Time: '226057.859375'
2019-12-19 11:32:15 (11120): Guest Log: job: CPU usage:
2019-12-19 11:32:15 (11120): Guest Log: 0m0.115s 0m0.217s
2019-12-19 11:32:15 (11120): Guest Log: 3265m39.044s 221m39.547s
2019-12-19 11:32:17 (11120): Guest Log: 11:32:16 CET +01:00 2019-12-19: cranky: [ERROR] Container 'runc' terminated with status code 1.
2019-12-19 11:32:31 (11120): Guest Log: [ERROR] Job Failed
ID: 41016 · Report as offensive     Reply Quote
Brummig
Avatar

Send message
Joined: 9 Feb 16
Posts: 35
Credit: 427,750
RAC: 339
Message 41023 - Posted: 20 Dec 2019, 14:19:03 UTC - in response to Message 40961.  
Last modified: 20 Dec 2019, 14:22:52 UTC

This Sherpa was finished now from a other Volunteer in half a hour?

I've seen that happen with my Virtual Box long-runners; tasks that I aborted after days or weeks of running were completed in a fraction of the time by another host.
ID: 41023 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners


©2020 CERN