Message boards : Number crunching : upload problem - zip? (31 Oct 2013)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile jay

Send message
Joined: 10 Aug 07
Posts: 54
Credit: 813,704
RAC: 116
Message 26042 - Posted: 31 Oct 2013, 11:45:01 UTC
Last modified: 31 Oct 2013, 11:46:16 UTC

I see this error - as wu gets re-assigned many times.

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<stderr_txt>
08:45:08 (1512): called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>wzip_w30cbb3rd__1__s__64.31_59.32__12_12.5__3__54_1_sixvf_boinc12_0_6</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>



Thanks !!
Jay
ID: 26042 · Report as offensive     Reply Quote
Profile jay

Send message
Joined: 10 Aug 07
Posts: 54
Credit: 813,704
RAC: 116
Message 26043 - Posted: 31 Oct 2013, 12:05:59 UTC - in response to Message 26042.  

The Malaria Control project had an interesting report about the -161 error message:

http://www.malariacontrol.net/forum_thread.php?id=714&nowrap=true#6817

They said that the error happened because the program (WU) had previously crashed - and the cause of the crash was not reported.
So this may be difficult to debug.

here are some examples:
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9982844

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9982842

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9982837

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9982836

These wu were reassigned to about 10 different clients and failed at each client...

Would adding debug help? (which ones?)

Thanks!!
Jay
ID: 26043 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 26044 - Posted: 31 Oct 2013, 15:09:37 UTC - in response to Message 26043.  
Last modified: 31 Oct 2013, 15:09:48 UTC

These are all with the 'sixtracktest' application?

Two debugging thoughts come to mind - I might try both of them myself.

On a machine which can be monitored reasonably closely:

Set preferences to accept the test application, and try to download a task or two. If any arrive, suspend them immediately so they don't run too soon - they seem to fail very quickly.

Set <http_debug> and <file_xfer_debug> logging - heck, why not <http_xfer_debug> as well - to see what exactly the -161 error is caused by.

Disable networking.

Allow the tasks to run, and presumably fail. Look to see if an output file was created, and make a copy of it if so (it might contain clues helpful to the developers). Make a note of the file size.

Allow networking, and look at the event log to see what causes the failures. Wrong file size? Bad upload url? Could be something like that (though most of the main ones would appear in the event log anyway, without debug logging).

And report back.
ID: 26044 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 26045 - Posted: 31 Oct 2013, 15:19:42 UTC - in response to Message 26044.  

Well, I tried, but all I got were mainstream sixtrack tasks - rather more than I expected from the numbers on the Server Status page. Someone give me a shout if they see any more test tasks being released ;)
ID: 26045 · Report as offensive     Reply Quote
Profile jay

Send message
Joined: 10 Aug 07
Posts: 54
Credit: 813,704
RAC: 116
Message 26046 - Posted: 31 Oct 2013, 18:05:45 UTC - in response to Message 26045.  

Greetings.

Yes, they were all sixtracktest.

I tried to get more, but just got the regular applications.

My apologies for not noticing the type of application.

((Have to take cat to vet. will try again later to get test WU.))

Thanks,
Jay
ID: 26046 · Report as offensive     Reply Quote
Profile jay

Send message
Joined: 10 Aug 07
Posts: 54
Credit: 813,704
RAC: 116
Message 26047 - Posted: 1 Nov 2013, 16:29:08 UTC - in response to Message 26046.  

OK.
Same results as Richard - now, it look like there are no sixtracktest WU.

So I wrote a script to do updates and see if any sixtracktest WU were downloaded.
If so, suspend the LHCatHome project - until I can drop by and set logging flags.

A plan, anyway.

have fun,
Jay
ID: 26047 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 26048 - Posted: 1 Nov 2013, 22:32:01 UTC - in response to Message 26047.  

haivng test apps failed means they have debug info inside them, aren't they?
so developers got it anyway?
ID: 26048 · Report as offensive     Reply Quote
Profile jay

Send message
Joined: 10 Aug 07
Posts: 54
Credit: 813,704
RAC: 116
Message 26049 - Posted: 2 Nov 2013, 16:40:46 UTC - in response to Message 26048.  
Last modified: 2 Nov 2013, 16:56:48 UTC

Hi,
I wrote the first message because all that showed up (that I could see)
was the -161 error code during file transfer (which says the program crashed during upload).

Apparently another round of tests were released.
So far they are passing

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=10011467
through
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=10011484

so this is now, probably, a non-issue.

Jay

[edit improved message and increased range of ID of WU.]
ID: 26049 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 26050 - Posted: 2 Nov 2013, 17:00:00 UTC - in response to Message 26049.  

Last time I looked at the applications page (when this issue was first raised), there was a 'test' application for Windows which looked identical to the current production application. That led me to assume that the problem was with the design of the workunits, rather than with the application.

Now, the test is running on Linux only, and the workunits have (perhaps) changed too. The first couple of tasks I looked at had Igor Zacharov as one of the quorum partners, so I think we can take it that the test is under close observation.
ID: 26050 · Report as offensive     Reply Quote
Ano

Send message
Joined: 29 Nov 09
Posts: 42
Credit: 229,229
RAC: 0
Message 26051 - Posted: 4 Nov 2013, 10:07:28 UTC
Last modified: 4 Nov 2013, 10:07:52 UTC

Another one: http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=10001197

The good thing is, with 10 people trying, and 10 people failing, I know it's not my fault (I got some crash in the past that were partially my fault, but not this time).
ID: 26051 · Report as offensive     Reply Quote

Message boards : Number crunching : upload problem - zip? (31 Oct 2013)


©2024 CERN