Message boards : Theory Application : Pythia8 looooooong runner!
Message board moderation

To post messages, you must log in.

AuthorMessage
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 41771 - Posted: 29 Feb 2020, 15:43:52 UTC

So far task 265306266 reports
===> [runRivet] Fri Feb 28 02:13:12 UTC 2020 [boinc PbPb heavyion-mb 2760 - - pythia8 8.235 default 100000 42]
…
40300 events processed
40400 events processed
so it's 40% done
13436 boinc 39 19 157236 112724 3076 R 88.2 1.4 1944:57 pythia8.exe
after 32 hours...
ID: 41771 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 41800 - Posted: 2 Mar 2020, 11:34:43 UTC - in response to Message 41771.  

Now 89% done, after 70 hours...
ID: 41800 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 41810 - Posted: 3 Mar 2020, 13:35:41 UTC - in response to Message 41800.  

So task 265306266 finished yesterday after about 80 hrs and I didn't spot any error messages in the log on the occasions that I checked it, up to events processed = about 99700. BOINC reports a success and assigns me 3k credits.
On the other hand, the final log reports
02:13:14 GMT +00:00 2020-02-28: cranky-0.0.31: [INFO] ===> [runRivet] Fri Feb 28 02:13:12 UTC 2020 [boinc PbPb heavyion-mb 2760 - - pythia8 8.235 default 100000 42]
22:02:19 GMT +00:00 2020-03-02: cranky-0.0.31: [INFO] Container 'runc' finished with status code 1.
not zero as usual, and the MC Plots page (updated "2020-03-03 14:02:35") has gone from showing
run events attempts success failure lost
PbPb heavyion-mb 2760 - - pythia8 8.235 default 0 10 0 6 4
to
run events attempts success failure lost
PbPb heavyion-mb 2760 - - pythia8 8.235 default 0 10 0 7 3
implying that it thinks the job failed. (OK, so that might not be my specific task.)

But, did those 80 CPU hours actually achieve anything useful, or not?
ID: 41810 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1172
Credit: 54,737,480
RAC: 13,647
Message 41813 - Posted: 3 Mar 2020, 23:40:29 UTC - in response to Message 41810.  

YES yours is Valid and Theory tasks are not a certain length of time so we have short ones and long ones.

I have had a few of those 3 day tasks in the past (testing the versions) and one thing about these Theory tasks is you can get one that runs for 10 days and then end as a Computer Error so I just abort them if they get to 5 days and still try running (BUT I always watch mine start running via the VM Console since you can see if it has any *Fail* problems right at the start and then you can watch for them on the last page to see if it is *Failed* and if that happens in either places it will run for days but end up a Computer Error.

This is how you want them to be on the VM Console after running
ID: 41813 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 41821 - Posted: 5 Mar 2020, 14:54:48 UTC - in response to Message 41813.  

YES yours is Valid and Theory tasks are not a certain length of time so we have short ones and long ones.
I understand that different tasks model different physics cases with different codes and so run for different times. My point is that what looks to me and to BOINC like a task that has successfully generated 100k lead-ion events, isn't reflected on the MC Plots page even several days later:
run events attempts success failure lost
PbPb heavyion-mb 2760 - - pythia8 8.235 default 0 10 0 7 3
I'm assuming that MC Plots represents the physics "customer" view, so either those 100k events aren't useful for physics, or else (N.B. attempt numbers 10 vs 42) the results take a long time to percolate through. In turn, wouldn't the latter mean that the status reported on MC Plots is meaningless for deciding whether or not to cull tasks locally, as implied in the Sherpa thread?
ID: 41821 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1417
Credit: 9,441,051
RAC: 885
Message 41842 - Posted: 7 Mar 2020, 15:25:45 UTC
Last modified: 7 Mar 2020, 15:53:09 UTC

A valid long runner: https://lhcathome.cern.ch/lhcathome/result.php?resultid=265712521
===> [runRivet] Wed Mar  4 07:53:35 UTC 2020 [boinc pp jets 7000 40,-,460 - pythia8 8.240 cr1 100000 44]
CPU time 2 days 2 hours 3 min 43 sec
Peak disk usage 2.45 GB
ID: 41842 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 42247 - Posted: 20 Apr 2020, 23:05:03 UTC - in response to Message 41842.  

Another valid long runner: 271586073

===> [runRivet] Wed Apr 15 06:58:20 UTC 2020 [boinc pp jets 7000 40,-,610 - pythia8 8.301 dire-default 100000 2]
Run time 5 days 12 hours 19 min 22 sec
CPU time 5 days 10 hours 33 min 17 sec
Peak disk usage 1.91 MB
ID: 42247 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 42276 - Posted: 25 Apr 2020, 14:02:38 UTC

It is satisfying that the last Theory I will run for a while was also the longest I have ever had, at 3 days 17 hours 27 min 8 sec.

It was a pythia8, and I had looked for it on MCplots, but it was not there, so I just let it run.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=271822210
ID: 42276 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2531
Credit: 253,722,201
RAC: 41,981
Message 42277 - Posted: 25 Apr 2020, 14:17:37 UTC - in response to Message 42276.  

I had looked for it on MCplots, but it was not there

It is a Theory_2378-1045515-2_2, hence can be found here:
http://mcplots-dev.cern.ch/production.php?view=runs&rev=2378&display=all

You may filter the complete list for:
pp jets 7000 25,-,480 - pythia8 8.301 dire-default

attempts: 4
success: 2
failure: 0
unknown: 2
ID: 42277 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 42281 - Posted: 25 Apr 2020, 18:46:51 UTC - in response to Message 42277.  

It is a Theory_2378-1045515-2_2


OK, I was searching in Run 2279, which must have been the last one I looked for.
ID: 42281 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 847
Credit: 691,175,262
RAC: 107,120
Message 42909 - Posted: 25 Jun 2020, 8:01:01 UTC

I have an 8 day one! but it's not going the make the deadline so probally wasted effort from a BOINC perspective.

Theory_2390-1128549-16

pp jets 7000 80,-,1460 - pythia8 8.301 dire-default
ID: 42909 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 43763 - Posted: 1 Dec 2020, 22:31:45 UTC - in response to Message 42909.  

I spotted 289594409 as it was taking so long:

===> [runRivet] Sat Nov 28 23:15:38 UTC 2020 [boinc PbPb heavyion-mb 2760 - - pythia8 8.230 default 90000 150]

Run time 1 days 23 hours 53 min 55 sec
CPU time 1 days 23 hours 50 min 55 sec
Peak working set size 186.35 MB

At least this lead-lead task might actually have succeeded:
Container 'runc' finished with status code 0.


Meanwhile, elsewhere:
2688018 boinc     39  19   53544  20764   7116 R  96.3   0.3   3561:32 pythia8.exe

has reached "34300 events processed" after more than two days... luckily it's only going for 59k events.
===> [runRivet] Sun Nov 29 10:25:39 UTC 2020 [boinc pp jets 7000 150,-,1860 - pythia8 8.301 dire-default 59000 150]

ID: 43763 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1172
Credit: 54,737,480
RAC: 13,647
Message 43764 - Posted: 1 Dec 2020, 23:09:05 UTC - in response to Message 41771.  
Last modified: 1 Dec 2020, 23:14:11 UTC

It doesn't happen often for me but once in a while I do get the pythia8's to run between 24 and 35 hours Valid
( running Theory Simulation v5.21)
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2933395 one quick example

I have an 8 day one! but it's not going the make the deadline so probally wasted effort from a BOINC perspective.

Theory_2390-1128549-16

pp jets 7000 80,-,1460 - pythia8 8.301 dire-default


\pythia8 8.301 dire-default is one you should probably abort since they always run for 10 days and fail (all the ones I have checked)
ID: 43764 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 43775 - Posted: 3 Dec 2020, 16:15:36 UTC - in response to Message 43764.  
Last modified: 3 Dec 2020, 16:16:44 UTC

\pythia8 8.301 dire-default is one you should probably abort since they always run for 10 days and fail (all the ones I have checked)
Mine's been updating the log file with believable, if slow, progress:
58100 events processed
so I've let it run. Guess we find out this evening...

"dire" does indeed seem to be code for troublesome, though.
ID: 43775 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 169
Credit: 15,000,737
RAC: 15
Message 43776 - Posted: 4 Dec 2020, 1:31:14 UTC - in response to Message 43775.  

\pythia8 8.301 dire-default is one you should probably abort since they always run for 10 days and fail (all the ones I have checked)
Mine's been updating the log file with believable, if slow, progress...

10:25:41 GMT +00:00 2020-11-29: cranky-0.0.32: [INFO] ===> [runRivet] Sun Nov 29 10:25:39 UTC 2020 [boinc pp jets 7000 150,-,1860 - pythia8 8.301 dire-default 59000 150]
17:34:27 GMT +00:00 2020-12-03: cranky-0.0.32: [INFO] Container 'runc' finished with status code 0.

Run time 4 days 7 hours 8 min 53 sec
CPU time 4 days 6 hours 37 min 23 sec
Credit 3,085.19
Peak working set size 293.17 MB
Peak swap size 600.68 MB
Peak disk usage 1.86 MB
ID: 43776 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1172
Credit: 54,737,480
RAC: 13,647
Message 43777 - Posted: 4 Dec 2020, 3:50:35 UTC
Last modified: 4 Dec 2020, 4:02:24 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=289569109

I would say that was pure luck but after checking yours I see it was a Linux running the Native version of this particular Theory pythia8 8.301 dire-default so maybe they work that way but I have never seen one Valid running on a Windows OS with the regular version of Theory.

Maybe they should make sure these are only run on Linux Native

Some people never noticed this problem when they have 100+ cores running 24/7 but I happen to check others when I find mine having the problem and another member pointed that out a couple months ago when I was testing them.

example https://lhcathome.cern.ch/lhcathome/result.php?resultid=288750991
ID: 43777 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1172
Credit: 54,737,480
RAC: 13,647
Message 43778 - Posted: 4 Dec 2020, 3:50:58 UTC
Last modified: 4 Dec 2020, 3:52:23 UTC

either my isp is running like a snail or this server is trying to........just delete this with your magical powers
ID: 43778 · Report as offensive     Reply Quote

Message boards : Theory Application : Pythia8 looooooong runner!


©2024 CERN