Message boards : Theory Application : New version 263.80
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 36774 - Posted: 19 Sep 2018, 9:44:23 UTC

Updated the CVMFS configuration for openhtc.io.
ID: 36774 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1811
Credit: 118,326,300
RAC: 26,351
Message 36781 - Posted: 20 Sep 2018, 7:30:22 UTC

On one of my PCs where I have two new Theory task v263.80 running, I make the following observation with one of the two tasks (all 1-core):

total runtime: 14:51 hrs; total processor time: 6:01 hrs.

for the other one, the processor time is close to the runtime.
ID: 36781 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 36782 - Posted: 20 Sep 2018, 9:06:20 UTC - in response to Message 36781.  

total runtime: 14:51 hrs; total processor time: 6:01 hrs.
Maybe a dead Sherpa job?
ID: 36782 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2242
Credit: 173,901,910
RAC: 2,793
Message 36784 - Posted: 20 Sep 2018, 9:58:59 UTC - in response to Message 36782.  

total runtime: 14:51 hrs; total processor time: 6:01 hrs.
Maybe a dead Sherpa job?

Saw this often with a sherpa as last job (looping or can not be finished because of time-limit (18 hours)).

If it is possible to stop the looping from the project, than a sherpa can be running also as first job!
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4044#28084
ID: 36784 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1811
Credit: 118,326,300
RAC: 26,351
Message 36785 - Posted: 20 Sep 2018, 10:59:55 UTC - in response to Message 36784.  

Finally, the task got finished after 18 hours. CPU time was 6 hours.Crddit points: 132,03 :-(

For details, pls. see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=206972998
ID: 36785 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1417
Credit: 9,441,051
RAC: 885
Message 36792 - Posted: 20 Sep 2018, 16:16:29 UTC - in response to Message 36785.  

Finally, the task got finished after 18 hours. CPU time was 6 hours.Crddit points: 132,03 :-(
For 6 cpu-hours the credit seems to be OK, but it's not your fault that your machine was occupied for another 12 (idle) hours.
Normally the VM should be shutdown when no new job arrives within ~10 minutes.
ID: 36792 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36796 - Posted: 20 Sep 2018, 20:31:20 UTC

For pythia jobs the time required to process the first 10K events can be used to extrapolate the time required for the entire run with surprising accuracy, or at least so far. That means a babysitter script running on the host can decide from the task time remaining and the forecast time to completion whether to terminate the task gracefully or let it continue.

Jobs for herwig, epos, sherpa and phojet generators come so infrequently and my record keeping is so lacking I don't yet know if time to completion can be extrapolated accurately from the first 10K events for them. But I'm working on it.

Looping sherpas are easy to detect and terminate gracefully. Been doing it for months.

Other features that are working and helpful and apply to LHCb as well:
- if Condor doesn't start a new job in 10 minutes then gracefully terminate task
- any job that starts after the 10 hour mark triggers graceful task termination (optional)

Features for ATLAS too.
ID: 36796 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1811
Credit: 118,326,300
RAC: 26,351
Message 36798 - Posted: 21 Sep 2018, 7:19:39 UTC

what caught my eye is that the credit points under version 263.80 are markedly lower than under 263.70 (same PC, same settings)

263.70: total runtime 46.518 secs; CPU time 44.684 secs; points: 1.645,80
263.80: total runtime 65.219 secs; CPU time 64.027 secs; points: 153,57

any logical explanation for this big discrepancy?
ID: 36798 · Report as offensive     Reply Quote
BITLab Argo

Send message
Joined: 16 Jul 05
Posts: 24
Credit: 35,251,537
RAC: 0
Message 36800 - Posted: 21 Sep 2018, 8:21:22 UTC - in response to Message 36798.  

No idea why, but you're lucky: mine seem to have dropped by a factor of 100!
See e.g. hostid=10414406
ID: 36800 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1811
Credit: 118,326,300
RAC: 26,351
Message 36801 - Posted: 21 Sep 2018, 9:11:52 UTC - in response to Message 36800.  

No idea why, but you're lucky: mine seem to have dropped by a factor of 100!
See e.g. hostid=10414406
OMG, this is really strange :-(
It seems that credit points here is like lottery!

Someone should look into this, I guess.
ID: 36801 · Report as offensive     Reply Quote
BITLab Argo

Send message
Joined: 16 Jul 05
Posts: 24
Credit: 35,251,537
RAC: 0
Message 36803 - Posted: 21 Sep 2018, 17:25:37 UTC - in response to Message 36801.  

No idea why, but you're lucky: mine seem to have dropped by a factor of 100!

... and then this morning the credit rates have come back up again, but only by a factor of ten...
ID: 36803 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 36804 - Posted: 21 Sep 2018, 17:35:51 UTC - in response to Message 36803.  
Last modified: 21 Sep 2018, 17:48:53 UTC

a lottery and strange that these task hand out credits so diffrently.

~300 up to ~32k is a big gap.
ID: 36804 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1172
Credit: 54,738,375
RAC: 13,675
Message 36806 - Posted: 21 Sep 2018, 19:36:33 UTC
Last modified: 21 Sep 2018, 20:02:23 UTC

I have my usual 30+ Valids per day with this new version (glad the vdi was smaller than some we get)

And I also checked the tasks from another member that does lots of these tasks and it looks the same there.

The Valids do have about 60% less as far as credits and we still get that [ERROR] Could not connect to Condor server on port ..... and [ERROR] Condor exited after 10164s without running a job and the same old VM Heartbeat file specified, but missing. once in a while.

So I hope there was something good about the Valids as far as actual work being done and better in some way than the previous version. (I imagine those RAC's will be dropping for all running this version) and I noticed the CMS are crashing but I haven't tried any myself yet.

(I just watched one of my Theory 2-core tasks finish and the credit dropped about 1600 from the previous version 263.70)
ID: 36806 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2242
Credit: 173,901,910
RAC: 2,793
Message 36809 - Posted: 21 Sep 2018, 23:26:38 UTC

ID: 36809 · Report as offensive     Reply Quote

Message boards : Theory Application : New version 263.80


©2024 CERN