Message boards : Number crunching : WU errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Digitalis
Avatar

Send message
Joined: 2 Sep 04
Posts: 19
Credit: 26,799
RAC: 0
Message 7998 - Posted: 7 Jun 2005, 16:59:57 UTC
Last modified: 7 Jun 2005, 17:01:25 UTC

2 of my LHC wu's have exited with the following windows error message (Fortran),
"The process cannot access the file because it is being used by another process (Createfile, erno=32, unit=92)."

When I accept this message the cc produces the folowing messages,

07/06/2005 17:45:35|LHC@home|Unrecoverable error for result wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0 (The access code is invalid. (0xc) - exit code 12 (0xc))
07/06/2005 17:45:35||request_reschedule_cpus: process exited
07/06/2005 17:45:35|LHC@home|Deferring communication with project for 59 seconds
07/06/2005 17:45:35|LHC@home|Computation for result wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0 finished
07/06/2005 17:45:35||schedule_cpus: must schedule
07/06/2005 17:45:35|LHC@home|Resuming result wboinc7_v6s4hvnom__1__64.293_59.303__6_8__6__45_1_sixvf_boinc8769_4 using sixtrack version 4.67
07/06/2005 17:45:36|LHC@home|Started upload of wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0_0
07/06/2005 17:45:37|LHC@home|Finished upload of wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0_0
07/06/2005 17:45:37|LHC@home|Throughput 0 bytes/sec
07/06/2005 17:46:35|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
07/06/2005 17:46:36|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded

I am using XPSP2 and the 4.44 cc (optimised for P4), all other projects are proceeding normally.
Get BOINC WIKIed

ID: 7998 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 7999 - Posted: 7 Jun 2005, 17:59:21 UTC
Last modified: 7 Jun 2005, 19:17:56 UTC

One of my WU's stopped unexpected, but it continued to process without a fortran error (see bolded lines of the messages):

07.06.2005 16:13:46|LHC@home|Starting result wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952_4 using sixtrack version 4.67
07.06.2005 16:14:11|LHC@home|Requesting 11.82 seconds of work
07.06.2005 16:14:11|LHC@home|Sending request to scheduler: http://lhcathome-sched1.cern.ch/scheduler/cgi
07.06.2005 16:14:12|LHC@home|Scheduler RPC to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
07.06.2005 16:14:13|LHC@home|Started download of wboinc7_v6s4hvnom__1__64.306_59.316__8_10__6__18_1_sixvf_boinc8904.zip
07.06.2005 16:14:14|LHC@home|Finished download of wboinc7_v6s4hvnom__1__64.306_59.316__8_10__6__18_1_sixvf_boinc8904.zip
07.06.2005 16:14:14|LHC@home|Throughput 60792 bytes/sec
07.06.2005 17:23:49|LHC@home|Result wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952_4 exited with zero status but no 'finished' file
07.06.2005 17:23:49|LHC@home|If this happens repeatedly you may need to reset the project.
07.06.2005 17:23:49|LHC@home|Restarting result wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952_4 using sixtrack version 4.67


it is still crunching ...
[EDIT]
and it finished successfully (and reported):
07.06.2005 21:13:55|LHC@home|Computation for result wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952 finished
07.06.2005 21:13:55|LHC@home|Starting result wboinc7_v6s4hvnom__1__64.282_59.292__14_16__6__45_1_sixvf_boinc7791_0 using sixtrack version 4.67
07.06.2005 21:13:56|LHC@home|Started upload of wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952_4_0
07.06.2005 21:13:59|LHC@home|Finished upload of wboinc7_v6s4hvnom__1__64.298_59.308__16_18__6__9_1_sixvf_boinc7952_4_0
07.06.2005 21:13:59|LHC@home|Throughput 27373 bytes/sec
[/EDIT]

greetz littleBouncer


ID: 7999 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8001 - Posted: 7 Jun 2005, 20:45:32 UTC - in response to Message 7998.  

> 2 of my LHC wu's have exited with the following windows error message
> (Fortran),
> "The process cannot access the file because it is being used by another
> process (Createfile, erno=32, unit=92)."
>
> When I accept this message the cc produces the folowing messages,
>
> 07/06/2005 17:45:35|LHC@home|Unrecoverable error for result
> wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0 (The
> access code is invalid. (0xc) - exit code 12 (0xc))
> 07/06/2005 17:45:35||request_reschedule_cpus: process exited
> 07/06/2005 17:45:35|LHC@home|Deferring communication with project for 59
> seconds
> 07/06/2005 17:45:35|LHC@home|Computation for result
> wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0 finished
> 07/06/2005 17:45:35||schedule_cpus: must schedule
> 07/06/2005 17:45:35|LHC@home|Resuming result
> wboinc7_v6s4hvnom__1__64.293_59.303__6_8__6__45_1_sixvf_boinc8769_4 using
> sixtrack version 4.67
> 07/06/2005 17:45:36|LHC@home|Started upload of
> wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0_0
> 07/06/2005 17:45:37|LHC@home|Finished upload of
> wboinc7_v6s4hvnom__1__64.312_59.322__6_8__6__45_1_sixvf_boinc6270_0_0
> 07/06/2005 17:45:37|LHC@home|Throughput 0 bytes/sec
> 07/06/2005 17:46:35|LHC@home|Sending scheduler request to
> http://lhcathome-sched1.cern.ch/scheduler/cgi
> 07/06/2005 17:46:36|LHC@home|Scheduler request to
> http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
>
> I am using XPSP2 and the 4.44 cc (optimised for P4), all other projects are
> proceeding normally.

Digitalis ... can you send me your complete log file?

p.d.buck@comcast.net

Thanks ...

ID: 8001 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8002 - Posted: 7 Jun 2005, 20:48:49 UTC

LB,

You got hit by a bug they have been chasing for a long time. As near as I have been able to tell it is a timing thing as tasks are started and ended and the BOINC Manager/Daemon and the Science App talk past each other.

So the Daemon sees the Science App exiting (for whatever reason) and it says "Oh, you are done and have a zero status (no error) so it must be finished... where is the Result Data File? Oops, no finished file ..."

Yet the work will either restart on it own as it needs to ... or it gets picked up later after a restart and it completes ...
ID: 8002 · Report as offensive     Reply Quote
Digitalis
Avatar

Send message
Joined: 2 Sep 04
Posts: 19
Credit: 26,799
RAC: 0
Message 8004 - Posted: 7 Jun 2005, 21:36:41 UTC - in response to Message 8001.  

> Digitalis ... can you send me your complete log file?
>
> p.d.buck@comcast.net
>
> Thanks ...
>
Be glad to, which particular file tho? (hangs head in shame)
ID: 8004 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 8005 - Posted: 7 Jun 2005, 22:07:42 UTC - in response to Message 8002.  
Last modified: 7 Jun 2005, 22:17:42 UTC

> LB,
>
> You got hit by a bug they have been chasing for a long time. As near as I
> have been able to tell it is a timing thing as tasks are started and ended and
> the BOINC Manager/Daemon and the Science App talk past each other.
>
> So the Daemon sees the Science App exiting (for whatever reason) and it says
> "Oh, you are done and have a zero status (no error) so it must be finished...
> where is the Result Data File? Oops, no finished file ..."
>
> Yet the work will either restart on it own as it needs to ... or it gets
> picked up later after a restart and it completes ...
>
@Paul D. Buck
THX for your informative reply
littleBouncer
BTW: it was CC 4.27
ID: 8005 · Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 2 Sep 04
Posts: 545
Credit: 148,912
RAC: 0
Message 8012 - Posted: 8 Jun 2005, 12:04:39 UTC - in response to Message 8004.  

> > Digitalis ... can you send me your complete log file?
> >
> > p.d.buck@comcast.net
> >
> > Thanks ...
> >
> Be glad to, which particular file tho? (hangs head in shame)


Well, it is in your BOINC directory ... on windows it is the
c:\program files\boinc directory. The 4 files are named "something.txt" and one has "something.old", make a copy, zip them up and send them too me ...

On the mac they are in \library\application support\boinc and have the same names. I don't know for sure where they are in Linux.

I will be making a longer plea later today in that I have discovered how few messages I have data on ... I spent the better part of yesterday looking for more messages in the code (and I am not sure I got them all, even now ...)

ID: 8012 · Report as offensive     Reply Quote
Digitalis
Avatar

Send message
Joined: 2 Sep 04
Posts: 19
Credit: 26,799
RAC: 0
Message 8016 - Posted: 8 Jun 2005, 13:31:40 UTC - in response to Message 8012.  

Done, if its not correct let me know.

Good luck with your project.
ID: 8016 · Report as offensive     Reply Quote

Message boards : Number crunching : WU errors


©2024 CERN