Message boards :
Number crunching :
Maximum elapsed time exceeded
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,362,718 RAC: 18,760 |
Hi, I got two tasks which both ended with an error after about 100000 seconds. wuid=3706346 wuid=3706345 Anybody else got similar? Computer is a Q9400 so it should not be too slow. |
Send message Joined: 6 Oct 08 Posts: 3 Credit: 32,107 RAC: 0 |
I had one as well on a Q9450. Error after ~86,400 sec (24hrs) with Maximum elapsed time exceeded. Looks like there is a failsafe time limit in the workunits so they don\'t go on forever. But if the workspace1_slhc... type of workunits genuinely need many hours to process then the admins need to raise that limit because, as it is, there are plenty of computation hours going to waste. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
New tasks + old CPU time limit = new bug. Sad, but true... (c) I got 2 candidates for CPU time waste (2 big tasks are captured by P4 machine)... |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
Just discussed this in a meeting. The problem is that the job calculates how long it will run for. A new user is using slightly different calculations and it seems to be killing that calculation. bigmac is looking into it and should be able to extend the maximum time allowed quite easily. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
The speed is 1.5% per hour for the last _ultimate_inj_ tasks on this machine... Abort or wait? Is it possible to extend the time limit for running tasks? |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
A solution has been implemented. Sadly this probably won\'t help running WUs. Actually the \"Maximum elapsed time\" was originally reduced due to comments by volunteers. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
Thank You, Neasan. |
Send message Joined: 15 Jul 05 Posts: 26 Credit: 2,398,034 RAC: 326 |
Got also two of this long runners, if it goes too 100% first: 2 hours runtime at 5.8% resultid=18385236 second: 2 hours runtime at 4.6% resultid=18384520 Any chance, that the results will finish in time? Matthias |
Send message Joined: 26 Mar 07 Posts: 8 Credit: 508 RAC: 0 |
Hi, I\'ve just finished one of those WU\'s. Mine is the one which took 123,901 sec. At least I got it not earlier and was not stoped by this limit. If I find out that this is the speed I could expect from my cpu then it\'s probably time to buy a new pc. wuid=3706606 |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
Any chance, that the results will finish in time? This chance depends on new CPU time limit, Your machine needs approximately 2 days. |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
Any chance, that the results will finish in time? The new limit is over 2 days so you should be fine. |
Send message Joined: 15 Jul 05 Posts: 26 Credit: 2,398,034 RAC: 326 |
I hope so; I doubled the time limit. Eric. Could finish both results, CPU Time less than one day. 80000 and 57000 seconds. And yes, here checkpointing works fine. Matthias |
Send message Joined: 10 Sep 08 Posts: 29 Credit: 34,924 RAC: 0 |
What is this time limit, anyway? The WU deadline? The only reason to decrease that would be ensure that BOINC reports the results more quickly after the WUs have finished - is that really such a big issue? |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
What is this time limit, anyway? The WU deadline? The only reason to decrease that would be ensure that BOINC reports the results more quickly after the WUs have finished - is that really such a big issue? All I know is that it was initially reduced (at the beginning of the project) after feedback from volunteers. The time limit is how long a WU should take to process and if it hasn\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first). It is NOT the deadline (results back by a certain date). |
Send message Joined: 10 Sep 08 Posts: 29 Credit: 34,924 RAC: 0 |
The time limit is how long a WU should take to process and if it hasn\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first). Aah, now I see. In that case, perhaps an absolute limit is simply the wrong approach; a better way might be to check on the completion percentage periodically - stop the WU if it hasn\'t gone up at all in a while, or if it hasn\'t gone up more than some reasonable limit. Of course, that only works if the application is multithreaded, if you can tell BOINC to check up on it in that kind of detail, or if you can change the internal loops to check on themselves periodically to avoid deadlock. |
Send message Joined: 30 Nov 06 Posts: 234 Credit: 11,078 RAC: 0 |
The time limit is how long a WU should take to process and if it hasn\\\'t something is broken and it is cancelled. LHC@home WUs are usually short enough so we had ~24hours as the crunching time, however the new work takes longer than that (which we were unaware of at first). It wasn\'t an absolute limit it used some kind of algorithm when the job was created to set the time and the new jobs broke this calculation AFAIK. Tis sorted now and bigmac is trying to make sure it won\'t happen again. |
Send message Joined: 6 Oct 08 Posts: 3 Credit: 32,107 RAC: 0 |
Looks like today\'s wbnlaug10_DA-scaling-law1... batch of workunits exhibits the same error. I had 5 tasks (estimated to run for about 1 hour) error out with Maximum elapsed time exceeded after about 15 mins. |
©2024 CERN