Message boards : Number crunching : Weird error message.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 28337 - Posted: 4 Jan 2017, 15:08:31 UTC

I re-enabled crunching here after a hiatus, and downloaded a large work unit, my first for some years. Big download, but it went okay. The work unit started running and went merrily on its way for just over a minute when it crashed out with a computation error. The reason puzzled me...

Exit status -203 (0xFFFFFF35) ERR_NO_NETWORK_CONNECTION

... about one minute after it had just downloaded the thing?

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 28337 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 72
Credit: 4,639,155
RAC: 0
Message 28338 - Posted: 4 Jan 2017, 16:13:21 UTC - in response to Message 28337.  

The only work units that are properly self-contained in LHC@home are part of the the SixTrack and the sixtracktest subprojects. The rest require a network connection to even run.
ID: 28338 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 28339 - Posted: 4 Jan 2017, 16:18:30 UTC

There is nothing wrong with my network connection, obviously, if it is some link to a required external resource that is missing, well, nothing I can do about that is there. My hiatus may resume.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 28339 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,245
RAC: 15,991
Message 28345 - Posted: 4 Jan 2017, 18:59:30 UTC

I got yesterday also one of those. Below is a snipet from the log:

2017-01-03 14:02:56 (1848): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2017-01-03 14:02:56 (1848): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2017-01-03 14:02:56 (1848): Guest Log: [DEBUG] nc: connect to cern.ch port 80 (tcp) timed out: Operation now in progress
2017-01-03 14:02:56 (1848): Guest Log: nc: connect to cern.ch port 80 (tcp) failed: Network is unreachable


The odd thing is that the timeout for the connection is less than a second according to the log. That kind of delay can always happen without any problem in the network connection.
ID: 28345 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 28348 - Posted: 4 Jan 2017, 20:09:47 UTC - in response to Message 28345.  

The timeout is 5 seconds, which should be sufficient, and it can easily be increased if this is an issue. If it really did return in less than one second then it suggests that there is indeed no network from the internal perspective of the VM. If temporary, then the subsequent task should work. If persistent then we have an issue that needs further investigation.
ID: 28348 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,150,245
RAC: 15,991
Message 28350 - Posted: 4 Jan 2017, 22:43:10 UTC - in response to Message 28348.  

Well the host (laptop) is in wlan now so that adds more uncertainty level for the network connection. But anyhow it managed to report the failed task in less than two minutes after the log said the failure happened. Anyway it has finished two CMS tasks since so it was an intermittent failure.

A five second timeout is a bit short. To get the communication as robust as possible maybe a communication retry and a longer timeout is needed before giving up?
ID: 28350 · Report as offensive     Reply Quote
Ross*

Send message
Joined: 12 Nov 11
Posts: 6
Credit: 188,262
RAC: 0
Message 28574 - Posted: 19 Jan 2017, 23:47:13 UTC

Hi
what is this error code?
-1073740791 (0xC0000409) STATUS_STACK_BUFFER_OVERRUN
and how do I fix it?
Ross*
ID: 28574 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,693,866
RAC: 235,021
Message 28578 - Posted: 20 Jan 2017, 6:52:58 UTC - in response to Message 28574.  

this error is:

VM Heartbeat file specified, but missing file system status. (errno = '2')

There was an update to fix this plus the windows patches.
ID: 28578 · Report as offensive     Reply Quote

Message boards : Number crunching : Weird error message.


©2024 CERN