Message boards : Theory Application : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 962
Credit: 6,351,293
RAC: 490
Message 38929 - Posted: 21 May 2019, 16:48:51 UTC

A long running Pythia8 was killed. It had maybe done 35000 events out of the 81000 to do :-(

Job: ===> [runRivet] Mon May 20 16:23:17 CEST 2019 [boinc pp jets 13000 150,-,1860 - pythia8 8.235 cr1 81000 53]

The estimated <rsc_fpops_bound> was way loo low for this task resulting in

LHC@home 21 May 18:21:21 Aborting task Theory_33081_1558334952.886734_0: exceeded elapsed time limit 69714.17 (2000000.00G/28.69G)

CPU time used 16 hours 58 min 48 sec. The Pythia8 job was the 4th job of that task.
ID: 38929 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 949
Credit: 40,383,999
RAC: 5,037
Message 38932 - Posted: 22 May 2019, 7:09:32 UTC

ID: 38932 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 949
Credit: 40,383,999
RAC: 5,037
Message 38935 - Posted: 22 May 2019, 10:57:11 UTC - in response to Message 38929.  

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4548#33456

Looks like that happened with Sixtracks before as you know.
But I did just run a benchmark on this one anyway just in case.

(and looked around here too https://boinc.berkeley.edu/trac/search?q=rsc_fpops_bound )
ID: 38935 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38936 - Posted: 22 May 2019, 12:00:50 UTC - in response to Message 38929.  

I got a number of these too. They occurred on tasks that I had pushed beyond the 18 hour limit. It's interesting that I did NOT get the error on numerous other tasks I pushed well beyond 18 hours before the pentathlon. This started happening around the time of the "adjustments" that occurred to accommodate the pentathlon. I'm guessing a config file got accidentally altered during those "adjustments" and now the <rsc_fpops_bound> is far too low.
ID: 38936 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,167,138
RAC: 2,374
Message 39364 - Posted: 16 Jul 2019, 10:43:16 UTC

with the change from v263.95 to v263.97, I am getting the 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED error again:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=237173131

Why so? I had made no changes in my settings.
ID: 39364 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 252
Credit: 11,223,743
RAC: 3
Message 39365 - Posted: 16 Jul 2019, 13:59:07 UTC - in response to Message 39364.  
Last modified: 16 Jul 2019, 14:02:12 UTC

One for me too https://lhcathome.cern.ch/lhcathome/result.php?resultid=237142844 which I had extended to let a healthy but long sherpa run. As usual, it had started just before the 12hr limit with no chance of finishing before 18hrs or even my standard 24hr extension.

Actually, that was a 293.95
ID: 39365 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 39366 - Posted: 16 Jul 2019, 14:20:56 UTC - in response to Message 39365.  

I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.
ID: 39366 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,167,138
RAC: 2,374
Message 39368 - Posted: 16 Jul 2019, 15:15:18 UTC - in response to Message 39366.  

I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.
my host is still downloading v263.97 tasks. Only those, no v263.98
ID: 39368 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 962
Credit: 6,351,293
RAC: 490
Message 39369 - Posted: 16 Jul 2019, 17:05:32 UTC - in response to Message 39366.  
Last modified: 16 Jul 2019, 17:10:27 UTC

I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.

You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem!
The problem is <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>.
Could you tenfold that value?
ID: 39369 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,167,138
RAC: 2,374
Message 39370 - Posted: 17 Jul 2019, 5:46:51 UTC

A minute ago, my host downloaded another v263.97 task.
How is this now with v263.98 - has this one been called back?
ID: 39370 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 374,546,593
RAC: 32,030
Message 39371 - Posted: 17 Jul 2019, 5:52:33 UTC

I have 100% failure with Theory on these tasks, for about 1 week. even going back to the .95
ID: 39371 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 39373 - Posted: 17 Jul 2019, 7:48:27 UTC - in response to Message 39368.  

I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.
my host is still downloading v263.97 tasks. Only those, no v263.98

I have restarted the server. Please try again.
ID: 39373 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 39374 - Posted: 17 Jul 2019, 7:48:54 UTC - in response to Message 39369.  

I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.

You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem!
The problem is 2000000000000000.000000.
Could you tenfold that value?


I have just done this. Thanks.
ID: 39374 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 39375 - Posted: 17 Jul 2019, 7:49:22 UTC - in response to Message 39371.  

I have 100% failure with Theory on these tasks, for about 1 week. even going back to the .95


Do you have any examples? I see some aborted but they do not give any details.
ID: 39375 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 374,546,593
RAC: 32,030
Message 39379 - Posted: 17 Jul 2019, 17:34:31 UTC - in response to Message 39375.  
Last modified: 18 Jul 2019, 5:37:28 UTC

The ones I didn't abort were all 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

https://lhcathome.cern.ch/lhcathome/result.php?resultid=237228447
https://lhcathome.cern.ch/lhcathome/result.php?resultid=237245874
https://lhcathome.cern.ch/lhcathome/result.php?resultid=237243675
https://lhcathome.cern.ch/lhcathome/result.php?resultid=237243861

In general things are unstable in the last week or so.

I have 1 good one this morning
ID: 39379 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,167,138
RAC: 2,374
Message 39381 - Posted: 18 Jul 2019, 9:40:48 UTC

I am still getting the EXIT_TIME_LIMIT_EXCEEDED error, after almost exactly 18 hrs:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=237251435
https://lhcathome.cern.ch/lhcathome/result.php?resultid=237243711

how come?
ID: 39381 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 374,546,593
RAC: 32,030
Message 39390 - Posted: 18 Jul 2019, 20:59:16 UTC

Looks like things are back on track for me.
ID: 39390 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 949
Credit: 40,383,999
RAC: 5,037
Message 39435 - Posted: 25 Jul 2019, 23:26:33 UTC - in response to Message 39390.  

ID: 39435 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,167,138
RAC: 2,374
Message 39436 - Posted: 26 Jul 2019, 4:46:01 UTC - in response to Message 39435.  

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=119378610

20 hours later.......
although it says "194 (0x000000C2) EXIT_ABORTED_BY_CLIENT" - error 194, NOT error 197
whatever this now means ???
ID: 39436 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 962
Credit: 6,351,293
RAC: 490
Message 39437 - Posted: 26 Jul 2019, 5:21:58 UTC - in response to Message 39435.  

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=119378610

20 hours later.......

Grrrrrrrr #801

2019-07-25 15:31:52 (9028): VM Heartbeat file specified, but missing heartbeat.

That sometimes happens :(
ID: 39437 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED


©2020 CERN