Message boards : Number crunching : Maximum elapsed time exceeded
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 442
Credit: 23,557,306
RAC: 14,373
Message 21992 - Posted: 7 Mar 2010, 19:29:33 UTC

Hi,
I got two tasks which both ended with an error after about 100000 seconds.

wuid=3706346

wuid=3706345

Anybody else got similar? Computer is a Q9400 so it should not be too slow.
ID: 21992 · Report as offensive     Reply Quote
vasm

Send message
Joined: 6 Oct 08
Posts: 3
Credit: 32,107
RAC: 0
Message 21993 - Posted: 7 Mar 2010, 22:36:12 UTC

I had one as well on a Q9450.
Error after ~86,400 sec (24hrs) with Maximum elapsed time exceeded.

Looks like there is a failsafe time limit in the workunits so they don\'t go on forever.
But if the workspace1_slhc... type of workunits genuinely need many hours to process then the admins need to raise that limit because, as it is, there are plenty of computation hours going to waste.
ID: 21993 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 99
Credit: 8,151,748
RAC: 10
Message 22002 - Posted: 8 Mar 2010, 12:24:11 UTC
Last modified: 8 Mar 2010, 12:29:55 UTC

New tasks + old CPU time limit = new bug. Sad, but true... (c)
I got 2 candidates for CPU time waste (2 big tasks are captured by P4 machine)...
ID: 22002 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 22003 - Posted: 8 Mar 2010, 13:23:48 UTC

Just discussed this in a meeting.

The problem is that the job calculates how long it will run for. A new user is using slightly different calculations and it seems to be killing that calculation. bigmac is looking into it and should be able to extend the maximum time allowed quite easily.
ID: 22003 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 99
Credit: 8,151,748
RAC: 10
Message 22004 - Posted: 8 Mar 2010, 13:34:57 UTC

The speed is 1.5% per hour for the last _ultimate_inj_ tasks on this machine...
Abort or wait?
Is it possible to extend the time limit for running tasks?
ID: 22004 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 22006 - Posted: 8 Mar 2010, 15:16:31 UTC

A solution has been implemented. Sadly this probably won\'t help running WUs.

Actually the \"Maximum elapsed time\" was originally reduced due to comments by volunteers.
ID: 22006 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 99
Credit: 8,151,748
RAC: 10
Message 22010 - Posted: 8 Mar 2010, 15:24:15 UTC

Thank You, Neasan.
ID: 22010 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 15 Jul 05
Posts: 20
Credit: 1,121,264
RAC: 776
Message 22014 - Posted: 8 Mar 2010, 17:12:54 UTC

Got also two of this long runners, if it goes too 100%
first:
2 hours runtime at 5.8%
resultid=18385236

second:
2 hours runtime at 4.6%
resultid=18384520

Any chance, that the results will finish in time?
Matthias

ID: 22014 · Report as offensive     Reply Quote
Thomas

Send message
Joined: 26 Mar 07
Posts: 8
Credit: 508
RAC: 0
Message 22019 - Posted: 8 Mar 2010, 22:33:44 UTC

Hi,
I\'ve just finished one of those WU\'s. Mine is the one which took 123,901 sec. At least I got it not earlier and was not stoped by this limit.
If I find out that this is the speed I could expect from my cpu then it\'s probably time to buy a new pc.

wuid=3706606
ID: 22019 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 99
Credit: 8,151,748
RAC: 10
Message 22021 - Posted: 9 Mar 2010, 6:25:22 UTC - in response to Message 22014.  

Any chance, that the results will finish in time?

This chance depends on new CPU time limit, Your machine needs approximately 2 days.
ID: 22021 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 22025 - Posted: 9 Mar 2010, 9:50:57 UTC - in response to Message 22021.  

Any chance, that the results will finish in time?

This chance depends on new CPU time limit, Your machine needs approximately 2 days.

The new limit is over 2 days so you should be fine.
ID: 22025 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 15 Jul 05
Posts: 20
Credit: 1,121,264
RAC: 776
Message 22078 - Posted: 15 Mar 2010, 12:23:55 UTC - in response to Message 22069.  

I hope so; I doubled the time limit. Eric.
(Note also that if interrupted I have a checkpoint/restart
so that we do not lose everything, but restart from evry thousand turns i.e evry two minutes roughly.


Could finish both results, CPU Time less than one day.
80000 and 57000 seconds.

And yes, here checkpointing works fine.
Matthias

ID: 22078 · Report as offensive     Reply Quote
Ver Greeneyes

Send message
Joined: 10 Sep 08
Posts: 29
Credit: 34,924
RAC: 0
Message 22081 - Posted: 15 Mar 2010, 13:18:33 UTC - in response to Message 22078.  

What is this time limit, anyway? The WU deadline? The only reason to decrease that would be ensure that BOINC reports the results more quickly after the WUs have finished - is that really such a big issue?
ID: 22081 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 22083 - Posted: 15 Mar 2010, 15:39:10 UTC - in response to Message 22081.  

What is this time limit, anyway? The WU deadline? The only reason to decrease that would be ensure that BOINC reports the results more quickly after the WUs have finished - is that really such a big issue?

All I know is that it was initially reduced (at the beginning of the project) after feedback from volunteers.

The time limit is how long a WU should take to process and if it hasn\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first).

It is NOT the deadline (results back by a certain date).
ID: 22083 · Report as offensive     Reply Quote
Ver Greeneyes

Send message
Joined: 10 Sep 08
Posts: 29
Credit: 34,924
RAC: 0
Message 22085 - Posted: 15 Mar 2010, 18:14:19 UTC - in response to Message 22083.  

The time limit is how long a WU should take to process and if it hasn\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first).

It is NOT the deadline (results back by a certain date).

Aah, now I see. In that case, perhaps an absolute limit is simply the wrong approach; a better way might be to check on the completion percentage periodically - stop the WU if it hasn\'t gone up at all in a while, or if it hasn\'t gone up more than some reasonable limit. Of course, that only works if the application is multithreaded, if you can tell BOINC to check up on it in that kind of detail, or if you can change the internal loops to check on themselves periodically to avoid deadlock.
ID: 22085 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 22090 - Posted: 17 Mar 2010, 15:50:26 UTC - in response to Message 22085.  

The time limit is how long a WU should take to process and if it hasn\\\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first).

It is NOT the deadline (results back by a certain date).

Aah, now I see. In that case, perhaps an absolute limit is simply the wrong approach; a better way might be to check on the completion percentage periodically - stop the WU if it hasn\\\'t gone up at all in a while, or if it hasn\\\'t gone up more than some reasonable limit. Of course, that only works if the application is multithreaded, if you can tell BOINC to check up on it in that kind of detail, or if you can change the internal loops to check on themselves periodically to avoid deadlock.


It wasn\'t an absolute limit it used some kind of algorithm when the job was created to set the time and the new jobs broke this calculation AFAIK. Tis sorted now and bigmac is trying to make sure it won\'t happen again.
ID: 22090 · Report as offensive     Reply Quote
vasm

Send message
Joined: 6 Oct 08
Posts: 3
Credit: 32,107
RAC: 0
Message 22478 - Posted: 17 Aug 2010, 16:19:48 UTC

Looks like today\'s wbnlaug10_DA-scaling-law1... batch of workunits exhibits the same error.

I had 5 tasks (estimated to run for about 1 hour) error out with Maximum elapsed time exceeded after about 15 mins.
ID: 22478 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Maximum elapsed time exceeded


©2020 CERN