Message boards :
Theory Application :
Sherpa: Inaccurate rotation
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
It seems that I've a sherpa task in an endless error loop. It happened during the full optimization part. A small part of the log. ===> [runRivet] Thu Dec 6 09:01:16 CET 2018 [boinc ee zhad 34.8 - - sherpa 2.1.0 default 7000 8] . . . 857.668 pb +- ( 61.8366 pb = 7.20986 % ) 6180000 ( 6181446 -> 99.9 % ) integration time: ( 40m 17s (33m 55s) elapsed / 0s (0s) left ) [09:58:03] Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Poincare::Poincare(): Inaccurate rotation { a = (-0.927672,0.315193,0.200196) b = (0,0,1) a' = (0.180121,0.830317,0.52738) -> rel. dev. (inf,inf,-0.47262) m_ct = 0.200196 m_st = -0.979756 m_n = (0,7.12792e-08,-1.12223e-07) } Poincare::Poincare(): Inaccurate rotation { a = (-0.927672,0.315193,0.200196) b = (0,0,1) a' = (0.180121,0.830317,0.52738) -> rel. dev. (inf,inf,-0.47262) m_ct = 0.200196 m_st = -0.979756 m_n = (0,7.12792e-08,-1.12223e-07) } Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Poincare::Poincare(): Inaccurate rotation { a = (0.516323,-0.823684,-0.234427) b = (0,0,1) a' = (0.71149,0.675857,0.192354) -> rel. dev. (inf,inf,-0.807646) m_ct = -0.234427 m_st = -0.972134 m_n = (0,-1.04007e-07,3.65441e-07) } Poincare::Poincare(): Inaccurate rotation { a = (0.516323,-0.823684,-0.234427) b = (0,0,1) a' = (0.71149,0.675857,0.192354) -> rel. dev. (inf,inf,-0.807646) m_ct = -0.234427 m_st = -0.972134 m_n = (0,-1.04007e-07,3.65441e-07) } Channel_Basics::Boost : Spacelike four vector ... Phase_Space_Integrator::AddPoint(): value = -nan. Skip. Poincare::Poincare(): Inaccurate rotation { a = (-0.997603,-0.0290356,0.0628051) b = (0,0,1) a' = (0.0064009,-0.419628,0.907673) -> rel. dev. (inf,-inf,-0.0923265) m_ct = 0.0628051 m_st = -0.998026 m_n = (-0,3.24412e-08,1.4998e-08) } Poincare::Poincare(): Inaccurate rotation { a = (-0.997603,-0.0290356,0.0628051) b = (0,0,1) a' = (0.0064009,-0.419628,0.907673) -> rel. dev. (inf,-inf,-0.0923265) m_ct = 0.0628051 m_st = -0.998026 m_n = (-0,3.24412e-08,1.4998e-08) } Channel_Basics::Boost : Spacelike four vector ... Channel_Basics::Boost : Spacelike four vector ... Poincare::Poincare(): Inaccurate rotation { a = (0.0448955,0.606004,-0.794194) b = (0,0,1) a' = (0.571396,-0.497834,0.652432) -> rel. dev. (inf,-inf,-0.347568) m_ct = -0.794194 m_st = -0.607665 m_n = (0,-7.4625e-07,-5.69421e-07) } Poincare::Poincare(): Inaccurate rotation { a = (0.0448955,0.606004,-0.794194) b = (0,0,1) a' = (0.571396,-0.497834,0.652432) -> rel. dev. (inf,-inf,-0.347568) m_ct = -0.794194 m_st = -0.607665 m_n = (0,-7.4625e-07,-5.69421e-07) } Phase_Space_Integrator::AddPoint(): value = nan. Skip. Channel_Basics::Boost : Spacelike four vector ... I have the impression that there are more errors with sherpa when the beam = ee |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
I have 10:04:07 +0000 2018-12-07 [INFO] MCPlots JobID: 47511542 in slot1 (also ee beam) exhibiting similar behaviour. Its running.log is currently up to 41M after 91/2hrs with <3hrs until the Task "should" self-terminate although I have seen some run beyond 18hrs until finishing a running job or eventually timing out after 20hrs-ish (would there be an absolute 24hr limit?) I'm loathe to reset it, just out of curiosity to see how big the log gets before it falls over. 49M at 111/2hrs, <1hr to 18hr cutoff. 56M at 13+hrs, 19hrs Task time with a little over 1hr estimated remaining. I'll not be waiting for it to end as it's already past my bedtime. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
... Task "should" self-terminate although I have seen some run beyond 18hrs until finishing a running job or eventually timing out after 20hrs-ish (would there be an absolute 24hr limit?)... should normally works fine, but there is an exception. The 18 hours lifetime of the VM is counted by the VM itself and it stops processing and displays "Shutdown". It does not shutdown the VM itself, but it's placing a shutdown file in BOINC's shared directory. This is picked up by the wrapper and the wrapper will shutdown the VM immediately. The exception is, when during lifetime the VM is rebooted for some reason. The 18 hours counting will restart from that moment, but the elapsed time counted by BOINC will add the new processing times from vboxheadless.exe's. One reason is: ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time. -> Postponed After BOINC's restart the VM starts from scratch. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
13:51:53 +0000 2018-12-16 [INFO] Condor JobID: 483657.11 in slot1 13:51:58 +0000 2018-12-16 [INFO] MCPlots JobID: 47743773 in slot1 ===> [runRivet] Sun Dec 16 13:51:53 GMT 2018 [boinc ee zhad 34.8 - - sherpa 1.4.0 default 29000 4] Poincare::Poincare(): Inaccurate rotation { a = (0.418823,-0.610538,0.672184) b = (0,0,1) a' = (0.953845,-0.201905,0.222292) -> rel. dev. (inf,-inf,-0.777708) m_ct = 0.672184 m_st = -0.740384 m_n = (-0,6.14559e-07,5.58198e-07) } 1.15562e+16 pb +- ( 9.75245e+15 pb = 84.3914 % ) 29420000 ( 29420626 -> 99.9 % ) integration time: ( 2h 4m elapsed / 512d 19h 19m 26s left ) I've reset the VM so as not to let it waste any more time and hopefully get a less sticky job to replace it. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
Anyone beat this one? 11:00:09 +0000 2018-12-22 [INFO] Condor JobID: 483250.4 in slot1 11:00:14 +0000 2018-12-22 [INFO] MCPlots JobID: 47687653 in slot1 ===> [runRivet] Sat Dec 22 11:00:09 GMT 2018 [boinc ee zhad 206 - - sherpa 1.4.1 default 100000 2] .2.57711e+20 pb +- ( 2.43595e+20 pb = 94.5225 % ) 118580000 ( 118582402 -> 99.9 % ) integration time: ( 10h 12m 16s elapsed / 3596d 19h 2m 31s left ) Binned it. Log was only 4M so a different problem to the Disk_Limit one. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Anyone beat this one?Yesterday I also had one with 10 years left. That's a bit more than 18 hours, so the task was killed. ... and again a sherpa with beam ee The used sherpa version could also be interesting. The current version is 2.2.5. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Another sherpa (beam ee) with inaccurate rotation and increasing integration time left. Running.log atm 17MB ===> [runRivet] Fri Dec 28 20:37:26 CET 2018 [boinc ee zhad 22 - - sherpa 1.4.3 default 100000 2] Poincare::Poincare(): Inaccurate rotation { a = (0.182694,0.974254,0.132106) b = (0,0,1) a' = (0.998688,-0.0507451,-0.0068809) -> rel. dev. (inf,-inf,-1.00688) m_ct = 0.132106 m_st = -0.991236 m_n = (0,1.0612e-07,-7.82612e-07) } 1.76608e+17 pb +- ( 1.43793e+17 pb = 81.4197 % ) 77480000 ( 77483474 -> 99.9 % ) integration time: ( 5h 9m 18s elapsed / 1282d 13h 39m 13s left ) I don't expect this one will grow to another disk limit exceeded, so I'll kill this task. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
===> [runRivet] Tue Jan 29 21:53:22 GMT 2019 [boinc ee zhad 14 - - sherpa 2.1.0 default 5000 8] 21:53:21 +0000 2019-01-29 [INFO] Condor JobID: 487868.6 in slot1 21:53:26 +0000 2019-01-29 [INFO] MCPlots JobID: 48414381 in slot1 Reset |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
===> [runRivet] Wed Feb 6 07:48:55 CET 2019 [boinc ee zhad 133 - - sherpa 2.1.0 default 3000 12] . . . . . . integration time: ( 58s (49s) elapsed / 0s (24141d 13h 45m 56s) left ) [08:02:59] That's after my death. After 4 hours run time a huge running.log of 85MB is created. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
All 3 of my hosts have wasted the last 17 hours or thereabouts, each running duff Sherpa jobs: ===> [runRivet] Sat Mar 16 01:50:09 CET 2019 [boinc pp winclusive 7000 -,-,10 - sherpa 2.2.5 default 1000 30] 01:50:09 +0100 2019-03-16 [INFO] Condor JobID: 491691.134 in slot1 01:50:14 +0100 2019-03-16 [INFO] MCPlots JobID: 49100363 in slot1 Simple 0 events looper ===> [runRivet] Sat Mar 16 01:10:40 CET 2019 [boinc ee zhad 133 - - sherpa 2.2.5 default 2000 30] 01:10:39 +0100 2019-03-16 [INFO] Condor JobID: 491721.1 in slot1 01:10:44 +0100 2019-03-16 [INFO] MCPlots JobID: 49105182 in slot1 1,000+ days to completion Log 700k ===> [runRivet] Sat Mar 16 03:34:21 CET 2019 [boinc ee zhad 133 - - sherpa 2.2.5 default 2000 31] 03:34:20 +0100 2019-03-16 [INFO] Condor JobID: 491915.1 in slot1 03:34:26 +0100 2019-03-16 [INFO] MCPlots JobID: 49132049 in slot1 1,300 days to completion Log 900k One has just terminated itself after 22hrs and the other two are past 18hrs and 19hrs so with only an hour or so remaining, it's hardly worthwhile resetting them. Would have done so if I'd caught them sooner but wasn't around until it was too late. Got full credits for the 22hr one and expect the same for the others but I'd rather be doing useful science. |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,128,280 RAC: 105,358 |
Yes Ray, Sherpa-Tasks are so complex to let they running on a x64-Computer. Some are finishing well, but most of them find no good end with our Hardware. We need a other concept for this tasks. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Sherpa-Tasks are so complex to let they running on a x64-Computer.I don't think that our hardware is the problem. It's probably the science / software combination / limited runtime. While beam ee seems to give the most errors with sherpa, a list of the last batch so far with beam ee used: run events attempts success failure lost ee zhad 133 - - sherpa 2.2.4 default 5000 17 1 1 15 ee zhad 133 - - sherpa 2.2.5 default 6000 33 3 3 27 ee zhad 14 - - sherpa 2.2.0 default 2000 17 1 1 15 ee zhad 14 - - sherpa 2.2.1 default 2000 17 1 1 15 ee zhad 14 - - sherpa 2.2.2 default 2000 17 1 2 14 ee zhad 14 - - sherpa 2.2.4 default 2000 17 1 1 15 ee zhad 14 - - sherpa 2.2.5 default 1000 33 1 2 30 ee zhad 189 - - sherpa 2.2.4 default 2000 17 1 2 14 ee zhad 189 - - sherpa 2.2.5 default 8000 33 3 1 29 ee zhad 197 - - sherpa 2.2.4 default 10000 17 1 4 12 ee zhad 197 - - sherpa 2.2.5 default 24000 33 5 5 23 ee zhad 200 - - sherpa 2.2.5 default 2000 33 1 5 27 ee zhad 206 - - sherpa 2.2.4 default 11000 17 3 1 13 ee zhad 206 - - sherpa 2.2.5 default 107000 33 4 3 26 ee zhad 22 - - sherpa 2.2.4 default 124000 17 3 1 13 ee zhad 22 - - sherpa 2.2.5 default 12000 33 3 3 27 ee zhad 29 - - sherpa 2.2.4 default 29000 17 1 2 14 ee zhad 29 - - sherpa 2.2.5 default 5000 33 1 4 28 ee zhad 34.8 - - sherpa 2.2.4 default 156000 17 3 3 11 ee zhad 34.8 - - sherpa 2.2.5 default 106000 33 3 4 26 ee zhad 43.6 - - sherpa 2.2.4 default 2000 17 1 1 15 ee zhad 43.6 - - sherpa 2.2.5 default 4000 33 2 6 25 ee zhad 91.2 - - sherpa 2.2.4 default 11000 17 2 2 13 ee zhad 91.2 - - sherpa 2.2.5 default 6000 33 3 5 25 |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 1 |
===> [runRivet] Tue Apr 2 11:34:20 CEST 2019 [boinc ee zhad 200 - - sherpa 2.2.2 default 2000 38] 11:34:20 +0200 2019-04-02 [INFO] Condor JobID: 493586.5 in slot1 11:34:25 +0200 2019-04-02 [INFO] MCPlots JobID: 49358220 in slot1 1.39889e+18 pb +- ( 1.09555e+18 pb = 78.3154 % ) 97080000 ( 97080940 -> 99.9 % ) integration time: ( 8h 41m 7s elapsed / 2224d 5h 6m 39s left ) [21:22:59] Only a 952k logfile but I've reset it so the next 8hrs will be more productive. "The definition of insanity is doing the same thing over and over again and expecting a different result." I think we have repeated the ee-beam/Sherpa combination enough times now to know that this combination just doesn't work. OK, it doesn't work in 3 different ways: looping, huge logfile or ever-increasing time to completion, and perhaps there should be investigation of each of these cases but surely there must be a large enough dataset of failures by now to allow that. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
I think we have repeated the ee-beam/Sherpa combination enough times now to know that this combination just doesn't work.The sherpa's with beam ee from this batch we have so far with some successes and no success at all: run 2279 events attempts success failure lost ee zhad 133 - - sherpa 1.2.2p default 0 19 0 17 2 ee zhad 133 - - sherpa 1.2.3 default 0 19 0 2 17 ee zhad 133 - - sherpa 1.3.0 default 0 19 0 1 18 ee zhad 133 - - sherpa 1.3.1 default 0 19 0 1 18 ee zhad 133 - - sherpa 1.4.0 default 0 19 0 0 19 ee zhad 133 - - sherpa 1.4.1 default 0 19 0 0 19 ee zhad 133 - - sherpa 1.4.2 default 0 19 0 5 14 ee zhad 133 - - sherpa 1.4.3 default 0 19 0 0 19 ee zhad 133 - - sherpa 1.4.5 default 0 19 0 0 19 ee zhad 133 - - sherpa 2.1.0 default 0 19 0 1 18 ee zhad 133 - - sherpa 2.1.1 default 0 19 0 4 15 ee zhad 133 - - sherpa 2.2.0 default 0 19 0 4 15 ee zhad 133 - - sherpa 2.2.1 default 0 19 0 4 15 ee zhad 133 - - sherpa 2.2.2 default 0 19 0 1 18 ee zhad 133 - - sherpa 2.2.4 default 5000 19 1 2 16 ee zhad 133 - - sherpa 2.2.5 default 8000 37 4 4 29 ee zhad 14 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 14 - - sherpa 1.2.3 default 0 19 0 2 17 ee zhad 14 - - sherpa 1.3.0 default 0 19 0 0 19 ee zhad 14 - - sherpa 1.3.1 default 0 19 0 0 19 ee zhad 14 - - sherpa 1.4.0 default 0 19 0 0 19 ee zhad 14 - - sherpa 1.4.1 default 0 19 0 4 15 ee zhad 14 - - sherpa 1.4.2 default 0 19 0 1 18 ee zhad 14 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 14 - - sherpa 1.4.5 default 0 19 0 0 19 ee zhad 14 - - sherpa 2.1.0 default 0 19 0 0 19 ee zhad 14 - - sherpa 2.1.1 default 0 19 0 3 16 ee zhad 14 - - sherpa 2.2.0 default 2000 19 1 3 15 ee zhad 14 - - sherpa 2.2.1 default 2000 19 1 1 17 ee zhad 14 - - sherpa 2.2.2 default 2000 19 1 3 15 ee zhad 14 - - sherpa 2.2.4 default 5000 19 2 2 15 ee zhad 14 - - sherpa 2.2.5 default 2000 37 2 3 32 ee zhad 189 - - sherpa 1.2.2p default 0 19 0 18 1 ee zhad 189 - - sherpa 1.2.3 default 0 19 0 2 17 ee zhad 189 - - sherpa 1.3.0 default 0 19 0 1 18 ee zhad 189 - - sherpa 1.3.1 default 0 19 0 3 16 ee zhad 189 - - sherpa 1.4.0 default 0 19 0 1 18 ee zhad 189 - - sherpa 1.4.1 default 0 19 0 1 18 ee zhad 189 - - sherpa 1.4.2 default 0 19 0 2 17 ee zhad 189 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 189 - - sherpa 1.4.5 default 0 19 0 0 19 ee zhad 189 - - sherpa 2.1.0 default 0 19 0 3 16 ee zhad 189 - - sherpa 2.1.1 default 0 19 0 3 16 ee zhad 189 - - sherpa 2.2.0 default 0 19 0 2 17 ee zhad 189 - - sherpa 2.2.1 default 0 19 0 3 16 ee zhad 189 - - sherpa 2.2.2 default 0 19 0 2 17 ee zhad 189 - - sherpa 2.2.4 default 2000 19 1 2 16 ee zhad 189 - - sherpa 2.2.5 default 11000 37 5 1 31 ee zhad 197 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 197 - - sherpa 1.2.3 default 0 19 0 2 17 ee zhad 197 - - sherpa 1.3.0 default 0 19 0 0 19 ee zhad 197 - - sherpa 1.3.1 default 0 19 0 4 15 ee zhad 197 - - sherpa 1.4.0 default 0 19 0 3 16 ee zhad 197 - - sherpa 1.4.1 default 0 19 0 1 18 ee zhad 197 - - sherpa 1.4.2 default 0 19 0 2 17 ee zhad 197 - - sherpa 1.4.3 default 0 19 0 0 19 ee zhad 197 - - sherpa 1.4.5 default 0 19 0 2 17 ee zhad 197 - - sherpa 2.1.0 default 0 19 0 1 18 ee zhad 197 - - sherpa 2.1.1 default 0 19 0 4 15 ee zhad 197 - - sherpa 2.2.0 default 0 19 0 3 16 ee zhad 197 - - sherpa 2.2.1 default 0 19 0 2 17 ee zhad 197 - - sherpa 2.2.2 default 0 19 0 2 17 ee zhad 197 - - sherpa 2.2.4 default 10000 19 1 4 14 ee zhad 197 - - sherpa 2.2.5 default 27000 37 6 6 25 ee zhad 200 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 200 - - sherpa 1.2.3 default 0 19 0 1 18 ee zhad 200 - - sherpa 1.3.0 default 0 19 0 2 17 ee zhad 200 - - sherpa 1.3.1 default 0 19 0 3 16 ee zhad 200 - - sherpa 1.4.0 default 0 19 0 0 19 ee zhad 200 - - sherpa 1.4.1 default 0 19 0 3 16 ee zhad 200 - - sherpa 1.4.2 default 0 19 0 1 18 ee zhad 200 - - sherpa 1.4.3 default 0 19 0 2 17 ee zhad 200 - - sherpa 1.4.5 default 0 19 0 1 18 ee zhad 200 - - sherpa 2.1.0 default 0 19 0 1 18 ee zhad 200 - - sherpa 2.1.1 default 0 19 0 6 13 ee zhad 200 - - sherpa 2.2.0 default 0 19 0 4 15 ee zhad 200 - - sherpa 2.2.1 default 0 19 0 2 17 ee zhad 200 - - sherpa 2.2.2 default 0 19 0 3 16 ee zhad 200 - - sherpa 2.2.4 default 0 19 0 3 16 ee zhad 200 - - sherpa 2.2.5 default 2000 37 1 6 30 ee zhad 206 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 206 - - sherpa 1.2.3 default 0 19 0 1 18 ee zhad 206 - - sherpa 1.3.0 default 0 19 0 0 19 ee zhad 206 - - sherpa 1.3.1 default 0 19 0 1 18 ee zhad 206 - - sherpa 1.4.0 default 0 19 0 1 18 ee zhad 206 - - sherpa 1.4.1 default 0 19 0 2 17 ee zhad 206 - - sherpa 1.4.2 default 0 19 0 1 18 ee zhad 206 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 206 - - sherpa 1.4.5 default 0 19 0 4 15 ee zhad 206 - - sherpa 2.1.0 default 0 19 0 2 17 ee zhad 206 - - sherpa 2.1.1 default 0 19 0 5 14 ee zhad 206 - - sherpa 2.2.0 default 0 19 0 3 16 ee zhad 206 - - sherpa 2.2.1 default 0 19 0 8 11 ee zhad 206 - - sherpa 2.2.2 default 0 19 0 2 17 ee zhad 206 - - sherpa 2.2.4 default 15000 19 4 2 13 ee zhad 206 - - sherpa 2.2.5 default 109000 37 5 4 28 ee zhad 22 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 22 - - sherpa 1.2.3 default 0 19 0 1 18 ee zhad 22 - - sherpa 1.3.0 default 0 19 0 0 19 ee zhad 22 - - sherpa 1.3.1 default 0 19 0 1 18 ee zhad 22 - - sherpa 1.4.0 default 0 19 0 2 17 ee zhad 22 - - sherpa 1.4.1 default 0 19 0 2 17 ee zhad 22 - - sherpa 1.4.2 default 0 19 0 1 18 ee zhad 22 - - sherpa 1.4.3 default 0 19 0 3 16 ee zhad 22 - - sherpa 1.4.5 default 0 19 0 1 18 ee zhad 22 - - sherpa 2.1.0 default 0 19 0 1 18 ee zhad 22 - - sherpa 2.1.1 default 0 19 0 0 19 ee zhad 22 - - sherpa 2.2.0 default 0 19 0 1 18 ee zhad 22 - - sherpa 2.2.1 default 0 19 0 2 17 ee zhad 22 - - sherpa 2.2.2 default 0 19 0 2 17 ee zhad 22 - - sherpa 2.2.4 default 129000 19 4 1 14 ee zhad 22 - - sherpa 2.2.5 default 14000 37 4 6 27 ee zhad 29 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 29 - - sherpa 1.2.3 default 0 19 0 1 18 ee zhad 29 - - sherpa 1.3.0 default 0 19 0 2 17 ee zhad 29 - - sherpa 1.3.1 default 0 19 0 1 18 ee zhad 29 - - sherpa 1.4.0 default 0 19 0 2 17 ee zhad 29 - - sherpa 1.4.1 default 0 19 0 1 18 ee zhad 29 - - sherpa 1.4.2 default 0 19 0 0 19 ee zhad 29 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 29 - - sherpa 1.4.5 default 0 19 0 1 18 ee zhad 29 - - sherpa 2.1.0 default 0 19 0 1 18 ee zhad 29 - - sherpa 2.1.1 default 0 19 0 3 16 ee zhad 29 - - sherpa 2.2.0 default 0 19 0 2 17 ee zhad 29 - - sherpa 2.2.1 default 0 19 0 0 19 ee zhad 29 - - sherpa 2.2.2 default 0 19 0 5 14 ee zhad 29 - - sherpa 2.2.4 default 32000 19 2 2 15 ee zhad 29 - - sherpa 2.2.5 default 7000 37 2 5 30 ee zhad 34.8 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 34.8 - - sherpa 1.2.3 default 0 19 0 2 17 ee zhad 34.8 - - sherpa 1.3.0 default 0 19 0 2 17 ee zhad 34.8 - - sherpa 1.3.1 default 0 19 0 3 16 ee zhad 34.8 - - sherpa 1.4.0 default 0 19 0 1 18 ee zhad 34.8 - - sherpa 1.4.1 default 0 19 0 1 18 ee zhad 34.8 - - sherpa 1.4.2 default 0 19 0 1 18 ee zhad 34.8 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 34.8 - - sherpa 1.4.5 default 0 19 0 3 16 ee zhad 34.8 - - sherpa 2.1.0 default 0 19 0 0 19 ee zhad 34.8 - - sherpa 2.1.1 default 0 19 0 2 17 ee zhad 34.8 - - sherpa 2.2.0 default 0 19 0 4 15 ee zhad 34.8 - - sherpa 2.2.1 default 0 19 0 4 15 ee zhad 34.8 - - sherpa 2.2.2 default 0 19 0 1 18 ee zhad 34.8 - - sherpa 2.2.4 default 165000 19 4 4 11 ee zhad 34.8 - - sherpa 2.2.5 default 108000 37 4 5 28 ee zhad 43.6 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 43.6 - - sherpa 1.2.3 default 0 19 0 1 18 ee zhad 43.6 - - sherpa 1.3.0 default 0 19 0 2 17 ee zhad 43.6 - - sherpa 1.3.1 default 0 19 0 0 19 ee zhad 43.6 - - sherpa 1.4.0 default 0 19 0 3 16 ee zhad 43.6 - - sherpa 1.4.1 default 0 19 0 1 18 ee zhad 43.6 - - sherpa 1.4.2 default 0 19 0 2 17 ee zhad 43.6 - - sherpa 1.4.3 default 0 19 0 1 18 ee zhad 43.6 - - sherpa 1.4.5 default 0 19 0 2 17 ee zhad 43.6 - - sherpa 2.1.0 default 0 19 0 2 17 ee zhad 43.6 - - sherpa 2.1.1 default 0 19 0 4 15 ee zhad 43.6 - - sherpa 2.2.0 default 0 19 0 2 17 ee zhad 43.6 - - sherpa 2.2.1 default 0 19 0 6 13 ee zhad 43.6 - - sherpa 2.2.2 default 0 19 0 1 18 ee zhad 43.6 - - sherpa 2.2.4 default 2000 19 1 2 16 ee zhad 43.6 - - sherpa 2.2.5 default 7000 37 3 8 26 ee zhad 91.2 - - sherpa 1.2.2p default 0 19 0 19 0 ee zhad 91.2 - - sherpa 1.2.3 default 0 19 0 0 19 ee zhad 91.2 - - sherpa 1.3.0 default 0 19 0 1 18 ee zhad 91.2 - - sherpa 1.3.1 default 0 19 0 2 17 ee zhad 91.2 - - sherpa 1.4.0 default 0 19 0 2 17 ee zhad 91.2 - - sherpa 1.4.1 default 0 19 0 0 19 ee zhad 91.2 - - sherpa 1.4.2 default 0 19 0 0 19 ee zhad 91.2 - - sherpa 1.4.3 default 0 19 0 0 19 ee zhad 91.2 - - sherpa 1.4.5 default 0 19 0 0 19 ee zhad 91.2 - - sherpa 2.1.0 default 0 19 0 3 16 ee zhad 91.2 - - sherpa 2.1.1 default 0 19 0 3 16 ee zhad 91.2 - - sherpa 2.2.0 default 0 19 0 1 18 ee zhad 91.2 - - sherpa 2.2.1 default 0 19 0 1 18 ee zhad 91.2 - - sherpa 2.2.2 default 0 19 0 2 17 ee zhad 91.2 - - sherpa 2.2.4 default 21000 19 4 2 13 ee zhad 91.2 - - sherpa 2.2.5 default 12000 37 5 6 26 |
Send message Joined: 31 Jan 11 Posts: 12 Credit: 3,557,813 RAC: 0 |
Dear all, Thanks for all the help charting this issue. Since we do not have any Sherpa authors on the LHC@home team, I have written to the Sherpa authors to ask for their help in understanding the source of these problems. As mentioned in a reply on another thread, we would really like to be able to keep producing comparisons to Sherpa - as this is one of the state-of-the-art Monte Carlo event generators that is being used at LHC. But of course, at the moment that point is kind of moot, since almost all jobs (at least for ee) are failing or lost. I hope to be able to provide another update on this soon. Peter. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Thanks Peter for your explanation in this thread and more extensive here. When I may recommend, you could consider to run the sherpa ee beams only with Sherpa version 2.2.4 or higher. I think reproduction of losses and failures over and over are not very useful. |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,128,280 RAC: 105,358 |
Now 36 hours so long - native Theory https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=127302369 [boinc ee zhad 43.6 - - sherpa 2.2.5 default 2000 189] Poincare::Poincare(): Inaccurate rotation { a = (0.101927,-0.951001,0.291903) b = (0,0,1) a' = (0.98122,-0.184403,0.0566011) -> rel. dev. (inf,-inf,-0.943399) m_ct = 0.291903 m_st = -0.956448 m_n = (-0,1.19137e-07,3.88142e-07) } 3.83645e+15 pb +- ( 1.56451e+15 pb = 40.7802 % ) 182540000 ( 182540027 -> 99.9 % ) integration time: ( 1d 4h 8m 17s elapsed / 1951d 15h 26m 21s left ) [10:26:04] Updating display... Edit: This is from file jobdata: runspec=boinc ee zhad 43.6 - - sherpa 2.2.5 default 2000 189 run=ee zhad 43.6 - - sherpa 2.2.5 default jobid=52333761 revision=2279 runid=752616 system=0 events=2000 seed=189 packsrc= |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,128,280 RAC: 105,358 |
This Sherpa from 29.Nov. seem to be same problem. [boinc pp jets 7000 150,-,2160 - sherpa 2.2.5 default 2000 189] https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=127376239 This Sherpa 2.2.5 needed to be investigated. |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
This Sherpa from 29.Nov. seem to be same problem.Nothing wrong with sherpa version 2.2.5 That version is over 14000 times used, whereof ~10000 were successful so far. It is the job description taking long because of complexity, but not always invalid - Only 1 out of 7 is a success. |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,128,280 RAC: 105,358 |
3.83645e+15 pb +- ( 1.56451e+15 pb = 40.7802 % ) 182540000 ( 182540027 -> 99.9 % ) 1951d are now after 40 hours runtime back to 1650d. Hoping not a exit limit of disk will reaching the task. |
©2024 CERN