Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1814
Credit: 118,529,297
RAC: 32,122
Message 41719 - Posted: 24 Feb 2020, 20:37:12 UTC - in response to Message 41650.  

Crystal Pellet wrote on Feb. 19:
The complete list of 613 failing sherpa's on to now IMO:
ee zhad 133 - - sherpa 1.2.2p default
ee zhad 133 - - sherpa 1.2.3 default
...
ppbar zinclusive 1800 -,-,50,130 - sherpa 2.1.0 default
You could add to this list:
ppbar zinclusive 1960 -,-,50,120 - sherpa 2.1.0 default

I just aborted it, since console F2 said:

Exception.Handler::Signal Handler: Signal (6) caught.
Cannot continue

and no CPU activity was shown on F3.
ID: 41719 · Report as offensive     Reply Quote
Profile zepingouin
Avatar

Send message
Joined: 7 Jan 07
Posts: 41
Credit: 16,102,983
RAC: 26
Message 41721 - Posted: 25 Feb 2020, 6:34:29 UTC

Crystal Pellet, in your list, I have one "ppbar jets 1960 37 - sherpa 2.2.6 default" succeeded.
ID: 41721 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41728 - Posted: 25 Feb 2020, 8:01:38 UTC - in response to Message 41719.  

You could add to this list:
ppbar zinclusive 1960 -,-,50,120 - sherpa 2.1.0 default

I just aborted it, since console F2 said:

Exception.Handler::Signal Handler: Signal (6) caught.
Cannot continue

and no CPU activity was shown on F3.

There's 1 success in MC Plots database for that description.
ID: 41728 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41729 - Posted: 25 Feb 2020, 8:06:55 UTC - in response to Message 41721.  
Last modified: 25 Feb 2020, 14:50:26 UTC

Crystal Pellet, in your list, I have one "ppbar jets 1960 37 - sherpa 2.2.6 default" succeeded.
Thanks, the list was based on 'only' 8 attempts for each description, but it seems more attempts could result in a success and your's was a fast one: 3 hours and 22 minutes.
One may build an own list from http://mcplots-dev.cern.ch/production.php?view=runs&rev=2363&display=all and filtering on keyword sherpa.

Edit: reduced the previous sherpa list from 613 to 577 occasions.
ID: 41729 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1814
Credit: 118,529,297
RAC: 32,122
Message 41770 - Posted: 29 Feb 2020, 15:27:16 UTC
Last modified: 29 Feb 2020, 15:38:50 UTC

hello Crystal Pellet,

There is a "boinc ee zhad 206 - - sherpa 2.2.8 default" on one of my machines, NOT contained (yet) in your list - 22 hrs 35 min elapsed, 1108 days left (number is increasing); I guess I can abort this task, right?

P.S. I just found it in http://mcplots-dev.cern.ch/production.php?view=runs&rev=2363&display=all
with 54000 / 18 / 3 / 5 / 10

but still, the increasing number of "days left" is indicating a failure, isn't it?
ID: 41770 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41777 - Posted: 29 Feb 2020, 20:44:30 UTC - in response to Message 41770.  

There is a "boinc ee zhad 206 - - sherpa 2.2.8 default" on one of my machines, NOT contained (yet) in your list - 22 hrs 35 min elapsed, 1108 days left (number is increasing)

P.S. I just found it in http://mcplots-dev.cern.ch/production.php?view=runs&rev=2363&display=all
with 54000 / 18 / 3 / 5 / 10

but still, the increasing number of "days left" is indicating a failure, isn't it?
When it's not in 'my' list, it will never come on it.

3 successes. It's up to you to gamble between a great disappointment and great satisfaction ;)

Increasing left time means not always that a task will end in an error, but that it's surely a heavy task.

I, myself have seen successes after such an increasing left time suddenly jumping to event processing.
ID: 41777 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 10
Message 41795 - Posted: 1 Mar 2020, 17:58:39 UTC - in response to Message 41729.  
Last modified: 1 Mar 2020, 17:59:13 UTC

One may build an own list from http://mcplots-dev.cern.ch/production.php?view=runs&rev=2363&display=all and filtering on keyword sherpa.
Out of interest, what does the "2363" refer to?

For "PbPb heavyion-mb 2760" I get
run events attempts success failure lost
PbPb heavyion-mb 2760 - - pythia8 8.235 default 0 10 0 6 4
but 265306266 is still running and reporting "events processed" to the log (I'm estimating an 80-hour run time).
Or is lost a badly-chosen euphemism for still running?
ID: 41795 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41796 - Posted: 1 Mar 2020, 21:38:05 UTC - in response to Message 41795.  

Out of interest, what does the "2363" refer to?
That is the revision of MC Production -> http://mcplots-dev.cern.ch/production.php?view=control

For "PbPb heavyion-mb 2760" I get
run events attempts success failure lost
PbPb heavyion-mb 2760 - - pythia8 8.235 default 0 10 0 6 4
but 265306266 is still running and reporting "events processed" to the log (I'm estimating an 80-hour run time).
Or is lost a badly-chosen euphemism for still running?
No, probably not. In theory you could be right, but a task would be running then for weeks and weeks, I think.
I don't know when a job is declared lost (No return, but since when?)
The last number of your job description (after the # events) is the attempt sequence. It will be something like 40 atm.
In the list it's still 10.
ID: 41796 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 10
Message 41801 - Posted: 2 Mar 2020, 11:40:02 UTC - in response to Message 41796.  

Thanks.
The last number of your job description (after the # events) is the attempt sequence. It will be something like 40 atm. In the list it's still 10.
The answer is that it is, of course, 42! See other thread.
ID: 41801 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1814
Credit: 118,529,297
RAC: 32,122
Message 41802 - Posted: 2 Mar 2020, 11:54:19 UTC - in response to Message 41777.  

Crystal Pellet wrote:
Increasing left time means not always that a task will end in an error, but that it's surely a heavy task.
I, myself have seen successes after such an increasing left time suddenly jumping to event processing.
meanwhile, "left time" has jumped up to 4983 days. Should I get worried by now?
ID: 41802 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 253,921,735
RAC: 41,404
Message 41803 - Posted: 2 Mar 2020, 12:12:33 UTC - in response to Message 41802.  

Should I get worried by now?

No - as long as there are no Vogon ships around.
Just prepare a towel.
ID: 41803 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 10
Message 41805 - Posted: 2 Mar 2020, 12:36:09 UTC - in response to Message 41803.  

Should I get worried by now?
No - as long as there are no Vogon ships around. Just prepare a towel.
... and for the rest of you, just keep banging the rockshadrons together, guys!
ID: 41805 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 10
Message 41806 - Posted: 2 Mar 2020, 12:44:11 UTC - in response to Message 41796.  
Last modified: 2 Mar 2020, 13:30:50 UTC

Or is lost a badly-chosen euphemism for still running?
No, probably not. In theory you could be right, but a task would be running then for weeks and weeks, I think.
I don't know when a job is declared lost (No return, but since when?)
I think that's my point - it looks like lost (which has negative connotations) is being used for everything which hasn't yet returned a success or failure, which could be for entirely reasonable reasons such as still being queued or running.
ID: 41806 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41888 - Posted: 11 Mar 2020, 20:35:29 UTC

===> [runRivet] Thu Mar  5 14:17:32 UTC 2020 [boinc pp jets 7000 300 - sherpa 1.4.1 default 100000 44]
Run time 6 days 5 hours 31 min 35 sec
CPU time 6 days 2 hours 28 min 50 sec
Peak disk usage 3.80 GB
https://lhcathome.cern.ch/lhcathome/result.php?resultid=266301804
ID: 41888 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1418
Credit: 9,470,586
RAC: 3,147
Message 41914 - Posted: 15 Mar 2020, 18:13:10 UTC

Passed the 10 days deadline, but accepted:
===> [runRivet] Wed Mar  4 11:41:37 UTC 2020 [boinc pp jets 7000 400 - sherpa 1.4.2 default 100000 44]
Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=265985279
Run time 11 days 5 hours 38 min 47 sec
CPU time 11 days 1 hours 25 min 51 sec
Peak disk usage 4.22 GB
ID: 41914 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 2,013
Message 43190 - Posted: 4 Aug 2020, 18:49:12 UTC
Last modified: 4 Aug 2020, 18:58:27 UTC

This Sherpa 1.4.2 [boinc pp jets 8000 600 - sherpa 1.4.2 default 100000 32]
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=143167540
did not finished a second time (because of a Restart of the Linux VM) at the same info:
55900 events processed
Event 56000 ( 8h 16m 56s elapsed / 6h 30m 27s left ) -> ETA: Tue Aug 04 23:50
XS = 24.568347762363 pb +- ( 0.10380316347774 pb = 0.42 % )
56000 events processed
dumping histograms...
Event 56100 ( 8h 17m 49s elapsed / 6h 29m 33s left ) -> ETA: Tue Aug 04 23:50
56100 events processed
Error in Cluster_Formation_Handler::ClustersToHadrons :
Did not find a kinematically allowed solution for the cluster list.
Will trigger a new event.

Event 56200 ( 8h 18m 49s elapsed / 6h 28m 46s left ) -> ETA: Tue Aug 04 23:50
56200 events processed
Event 56300 ( 8h 19m 31s elapsed / 6h 27m 44s left ) -> ETA: Tue Aug 04 23:50
56300 events processed

Need investigation, because is started for a other User (now a third time!!) from the System
ID: 43190 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 2,013
Message 43193 - Posted: 6 Aug 2020, 10:55:47 UTC

Theory_2390-1148370-34 - [boinc pp winclusive 7000 10 - sherpa 1.4.5 default 3000 34]
Now 22 hours and 40 Min.
Any chance to see a end?
Channel_Elements::GenerateYForward(2.1912818017204e-09,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-9.24186,}): Y out of bounds !
ymin, ymax vs. y : -9.96939 9.96939 vs. -9.96939
Setting y to lower bound ymin=-9.96939
ISR_Handler::MakeISR(..): s' out of bounds.
s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049
ISR_Handler::MakeISR(..): s' out of bounds.
s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049
Channel_Elements::GenerateYBackward(1.4743920286663e-10,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,3.99746,}): Y out of bounds !
ymin, ymax vs. y : -10 10 vs. 10
Setting y to upper bound ymax=10
ID: 43193 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 10
Message 43194 - Posted: 6 Aug 2020, 11:18:04 UTC - in response to Message 43193.  

Try to grep the log file for events or elapsed - this can make it easier to see if there's any actual progress.

But a continuing stream of ISR_Handler errors is usually a bad sign in my experience.
ID: 43194 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2243
Credit: 173,902,375
RAC: 2,013
Message 43195 - Posted: 6 Aug 2020, 11:40:25 UTC - in response to Message 43194.  

Thank you Henry,
this is from the beginning, will cancel the task:
Comix was compiled for multithreading.
Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 232156 kB, 19s / 19s ).
Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 232552 kB, 0s / 0s ).
Initialized the Matrix_Element_Handler for the hard processes.
Initialized the Soft_Photon_Handler.
Hadron_Decay_Map::Read: Initializing HadronDecays.dat. This may take some time.
Initialized the Hadron_Decay_Handler, Decay model = Hadrons
Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal)
Starting the calculation. Lean back and enjoy ... .
Channel_Elements::GenerateYBackward(2.2879576294321e-09,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,5.42984,}): Y out of bounds !
ymin, ymax vs. y : -9.9478031411026 9.9478031411026 vs. 9.9478031411026
ID: 43195 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2534
Credit: 253,921,735
RAC: 41,404
Message 43212 - Posted: 12 Aug 2020, 11:46:19 UTC

===> [runRivet] Fri Aug  7 00:05:40 UTC 2020 [boinc pp winclusive 7000 20 - sherpa 2.2.5 default 2000 34]
.
.
.
+----------------------------------+
|                                  |
|      CCC  OOO  M   M I X   X     |
|     C    O   O MM MM I  X X      |
|     C    O   O M M M I   X       |
|     C    O   O M   M I  X X      |
|      CCC  OOO  M   M I X   X     |
|                                  |
+==================================+
|  Color dressed  Matrix Elements  |
|     http://comix.freacafe.de     |
|   please cite  JHEP12(2008)039   |
+----------------------------------+
Matrix_Element_Handler::BuildProcesses(): Looking for processes .................................................................................................................................................................................... done ( 45 MB, 7s / 7s ).
Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................................................................................................................... done ( 45 MB, 0s / 0s ).
Initialized the Matrix_Element_Handler for the hard processes.
Initialized the Beam_Remnant_Handler.
Hadron_Decay_Map::Read:   Initializing HadronDecays.dat. This may take some time.
Initialized the Hadron_Decay_Handler, Decay model = Hadrons
Initialized the Soft_Photon_Handler.
Variations::InitialiseParametersVector(0 variations){
  Named variations:
}
Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__veb' (Comix)
Starting the calculation at 00:08:46. Lean back and enjoy ... .

No more logfile lines for >5 days.
ID: 43212 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners


©2024 CERN