Thread 'New version 263.80'

Author	Message
Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 254,092 RAC: 102	Message 36774 - Posted: 19 Sep 2018, 9:44:23 UTC Updated the CVMFS configuration for openhtc.io. ID: 36774 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,329,556 RAC: 45,308	Message 36781 - Posted: 20 Sep 2018, 7:30:22 UTC On one of my PCs where I have two new Theory task v263.80 running, I make the following observation with one of the two tasks (all 1-core): total runtime: 14:51 hrs; total processor time: 6:01 hrs. for the other one, the processor time is close to the runtime. ID: 36781 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,659,192 RAC: 2	Message 36782 - Posted: 20 Sep 2018, 9:06:20 UTC - in response to Message 36781. total runtime: 14:51 hrs; total processor time: 6:01 hrs. Maybe a dead Sherpa job? ID: 36782 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2286 Credit: 178,847,236 RAC: 1,810	Message 36784 - Posted: 20 Sep 2018, 9:58:59 UTC - in response to Message 36782. total runtime: 14:51 hrs; total processor time: 6:01 hrs. Maybe a dead Sherpa job? Saw this often with a sherpa as last job (looping or can not be finished because of time-limit (18 hours)). If it is possible to stop the looping from the project, than a sherpa can be running also as first job! https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4044#28084 ID: 36784 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,329,556 RAC: 45,308	Message 36785 - Posted: 20 Sep 2018, 10:59:55 UTC - in response to Message 36784. Finally, the task got finished after 18 hours. CPU time was 6 hours.Crddit points: 132,03 :-( For details, pls. see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=206972998 ID: 36785 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1533 Credit: 10,042,485 RAC: 1,277	Message 36792 - Posted: 20 Sep 2018, 16:16:29 UTC - in response to Message 36785. Finally, the task got finished after 18 hours. CPU time was 6 hours.Crddit points: 132,03 :-( For 6 cpu-hours the credit seems to be OK, but it's not your fault that your machine was occupied for another 12 (idle) hours. Normally the VM should be shutdown when no new job arrives within ~10 minutes. ID: 36792 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 36796 - Posted: 20 Sep 2018, 20:31:20 UTC For pythia jobs the time required to process the first 10K events can be used to extrapolate the time required for the entire run with surprising accuracy, or at least so far. That means a babysitter script running on the host can decide from the task time remaining and the forecast time to completion whether to terminate the task gracefully or let it continue. Jobs for herwig, epos, sherpa and phojet generators come so infrequently and my record keeping is so lacking I don't yet know if time to completion can be extrapolated accurately from the first 10K events for them. But I'm working on it. Looping sherpas are easy to detect and terminate gracefully. Been doing it for months. Other features that are working and helpful and apply to LHCb as well: - if Condor doesn't start a new job in 10 minutes then gracefully terminate task - any job that starts after the 10 hour mark triggers graceful task termination (optional) Features for ATLAS too. ID: 36796 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,329,556 RAC: 45,308	Message 36798 - Posted: 21 Sep 2018, 7:19:39 UTC what caught my eye is that the credit points under version 263.80 are markedly lower than under 263.70 (same PC, same settings) 263.70: total runtime 46.518 secs; CPU time 44.684 secs; points: 1.645,80 263.80: total runtime 65.219 secs; CPU time 64.027 secs; points: 153,57 any logical explanation for this big discrepancy? ID: 36798 · Reply Quote

BITLab Argo Send message Joined: 16 Jul 05 Posts: 24 Credit: 35,251,537 RAC: 0	Message 36800 - Posted: 21 Sep 2018, 8:21:22 UTC - in response to Message 36798. No idea why, but you're lucky: mine seem to have dropped by a factor of 100! See e.g. hostid=10414406 ID: 36800 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,329,556 RAC: 45,308	Message 36801 - Posted: 21 Sep 2018, 9:11:52 UTC - in response to Message 36800. No idea why, but you're lucky: mine seem to have dropped by a factor of 100! See e.g. hostid=10414406 OMG, this is really strange :-( It seems that credit points here is like lottery! Someone should look into this, I guess. ID: 36801 · Reply Quote

BITLab Argo Send message Joined: 16 Jul 05 Posts: 24 Credit: 35,251,537 RAC: 0	Message 36803 - Posted: 21 Sep 2018, 17:25:37 UTC - in response to Message 36801. No idea why, but you're lucky: mine seem to have dropped by a factor of 100! ... and then this morning the credit rates have come back up again, but only by a factor of ten... ID: 36803 · Reply Quote

Greger Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0	Message 36804 - Posted: 21 Sep 2018, 17:35:51 UTC - in response to Message 36803. Last modified: 21 Sep 2018, 17:48:53 UTC a lottery and strange that these task hand out credits so diffrently. ~300 up to ~32k is a big gap. ID: 36804 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1291 Credit: 95,276,708 RAC: 34,055	Message 36806 - Posted: 21 Sep 2018, 19:36:33 UTC Last modified: 21 Sep 2018, 20:02:23 UTC I have my usual 30+ Valids per day with this new version (glad the vdi was smaller than some we get) And I also checked the tasks from another member that does lots of these tasks and it looks the same there. The Valids do have about 60% less as far as credits and we still get that [ERROR] Could not connect to Condor server on port ..... and [ERROR] Condor exited after 10164s without running a job and the same old VM Heartbeat file specified, but missing. once in a while. So I hope there was something good about the Valids as far as actual work being done and better in some way than the previous version. (I imagine those RAC's will be dropping for all running this version) and I noticed the CMS are crashing but I haven't tried any myself yet. (I just watched one of my Theory 2-core tasks finish and the credit dropped about 1600 from the previous version 263.70) ID: 36806 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2286 Credit: 178,847,236 RAC: 1,810	Message 36809 - Posted: 21 Sep 2018, 23:26:38 UTC MCProd doesn't working well: lost ratio 100% http://mcplots-dev.cern.ch/production.php?view=status&plots=hourly#plots ID: 36809 · Reply Quote