Message boards : Number crunching : Confused on WU error
Message board moderation

To post messages, you must log in.

AuthorMessage
Richard Mitnick
Avatar

Send message
Joined: 20 Dec 07
Posts: 69
Credit: 599,151
RAC: 0
Message 22336 - Posted: 20 May 2010, 21:05:21 UTC

I got a work unit on the computer named desktopII.

The WU is reported as Task ID 18592738, WU ID 3825590.
The WU ended in error; but the reports have a discrepancy and are confusing. Outcome=Client error, Client State= Compute error, CPU time 14.24sec.

stderr out reads

<core_client_version>6.10.43</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
]]>

Now, how can 14.24 seconds exceed a maximum elapsed time? I have a WU right now on a different machine, laptopII, which is almost completed with a total time of over 23 hours.

What is 14.24 seconds?

How to explain this discrepancy?


Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings
ID: 22336 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,994,057
RAC: 701
Message 22475 - Posted: 17 Aug 2010, 15:24:11 UTC

The same ``song``
17/08/2010 18:19:29 lhcathome Aborting task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__44_46__6__21_1_sixvf_boinc722_0: exceeded CPU time limit 1723.796707
Very confused. :-(
ID: 22475 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,994,057
RAC: 701
Message 22476 - Posted: 17 Aug 2010, 15:42:33 UTC
Last modified: 17 Aug 2010, 16:11:35 UTC

The same situation on all hosts. All tasks for SixTrack ending with limited CPU time error.
Dear project team!
What are You doing? Looking for supercomputers? ;o)
ID: 22476 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,994,057
RAC: 701
Message 22477 - Posted: 17 Aug 2010, 15:48:06 UTC
Last modified: 17 Aug 2010, 16:11:57 UTC

WOW!
I got work for SixTrackBNL.
And it looks like - this application contains the same screen saver bug. :-(
ID: 22477 · Report as offensive     Reply Quote
Warped

Send message
Joined: 18 Sep 04
Posts: 40
Credit: 60,176
RAC: 0
Message 22479 - Posted: 17 Aug 2010, 16:21:07 UTC - in response to Message 22477.  

WOW!
I got work for SixTrackBNL.
And it looks like - this application contains the same screen saver bug. :-(



I also have one of the NBL workunits, to be specific:
SixTrackBNL 4209.00

I wonder what they are?
Warped

ID: 22479 · Report as offensive     Reply Quote
Profile CoM

Send message
Joined: 29 Sep 04
Posts: 42
Credit: 11,505,632
RAC: 0
Message 22480 - Posted: 17 Aug 2010, 17:26:19 UTC
Last modified: 17 Aug 2010, 17:28:19 UTC

Same here:

17/08/2010 19:18:56 lhcathome Aborting task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__76_78__6__78_1_sixvf_boinc1704_3: exceeded elapsed time limit 1352.531625
17/08/2010 19:18:56 lhcathome Aborting task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__48_50__6__6_1_sixvf_boinc830_4: exceeded elapsed time limit 1352.531625

Seems like they don't have much trust in the stability of their WUs.

But there is also this error:
17/08/2010 19:19:01 lhcathome [error] Error reported by file upload server: nbytes missing or negative
17/08/2010 19:19:01 lhcathome Giving up on upload of wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__48_50__6__6_1_sixvf_boinc830_4_0: permanent upload error
ID: 22480 · Report as offensive     Reply Quote
Profile Aleksander

Send message
Joined: 29 Mar 10
Posts: 7
Credit: 5,048,999
RAC: 0
Message 22482 - Posted: 17 Aug 2010, 20:35:38 UTC

WU ID 3962978 for all six computers:

<![CDATA[
<message>
Maximum CPU time exceeded
</message>
]]>

Even i7 CPU 920 @ 2.67GHz could not complete....

ID: 22482 · Report as offensive     Reply Quote
Tony DeBari

Send message
Joined: 27 Sep 04
Posts: 21
Credit: 1,707,785
RAC: 0
Message 22483 - Posted: 17 Aug 2010, 20:50:17 UTC

I have several these waiting to crunch on a couple of my machines as well. All have been returned by my wingmen with the same error and continue to be re-issued, probably until the max error count of 10 is reached. Looks like it\'s a totally chowdered batch of work units.

That ::thud:: you hear is my daily WU quota dropping like a stone... :-)


-- Tony D.
ID: 22483 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,994,057
RAC: 701
Message 22487 - Posted: 18 Aug 2010, 6:31:57 UTC - in response to Message 22483.  
Last modified: 18 Aug 2010, 6:32:56 UTC

...All have been returned by my wingmen with the same error and continue to be re-issued, probably until the max error count of 10 is reached.

Confirmed! :-)
ID: 22487 · Report as offensive     Reply Quote
Ano

Send message
Joined: 29 Nov 09
Posts: 42
Credit: 229,229
RAC: 0
Message 22488 - Posted: 18 Aug 2010, 6:42:26 UTC

18/08/2010 08:18:27 lhcathome Starting wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7
18/08/2010 08:18:27 lhcathome Starting task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7 using sixtrack version 420900
18/08/2010 08:35:37 lhcathome Aborting task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7: exceeded elapsed time limit 1029.291298
18/08/2010 08:35:38 lhcathome Computation for task wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7 finished
18/08/2010 08:35:40 lhcathome Started upload of wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7_0
18/08/2010 08:35:41 lhcathome [error] Error reported by file upload server: nbytes missing or negative
18/08/2010 08:35:41 lhcathome Giving up on upload of wbnlaug10_DA-scaling-law1__1__s__64.31_59.32__84_86__6__87_1_sixvf_boinc1946_7_0: permanent upload error
18/08/2010 08:36:37 lhcathome Sending scheduler request: To fetch work.
18/08/2010 08:36:37 lhcathome Reporting 1 completed tasks, requesting new tasks for CPU
18/08/2010 08:36:42 lhcathome Scheduler request completed: got 0 new tasks


And I went to the trouble of cancelling another project WU and disabling receiving new works for others projects so that lhc@home would get optimal time T_T
ID: 22488 · Report as offensive     Reply Quote
Ano

Send message
Joined: 29 Nov 09
Posts: 42
Credit: 229,229
RAC: 0
Message 22513 - Posted: 1 Sep 2010, 16:56:33 UTC
Last modified: 1 Sep 2010, 16:57:07 UTC

I just saw the tasks I got on 27 Aug 2010 and today ended successfully.
Right now parts of the project are not running,but it look like a fix is being applied and work units are getting ready again,good news ^_^.
ID: 22513 · Report as offensive     Reply Quote

Message boards : Number crunching : Confused on WU error


©2024 CERN