log in

Error -161


Advanced search

Message boards : ATLAS application : Error -161

1 · 2 · 3 · Next
Author Message
Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 29918 - Posted: 12 Apr 2017, 20:06:01 UTC

I've got a few tasks with this error.

2017-04-12 21:19:33 (13792): Guest Log: Starting ATLAS job. (PandaID=3326368081 taskID=10995525)
2017-04-12 21:21:23 (13792): Guest Log: Failed! Shutting down the machine.

upload failure: <file_xfer_error>
<file_name>FZ9LDm7tXIqnDDn7oo6G73TpABFKDmABFKDmPaIKDmaFFKDmJSlUln_0_ATLAS_result</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 29920 - Posted: 12 Apr 2017, 20:45:29 UTC - in response to Message 29918.

Most likely a faulty batch.
I also have 1 of them and that WU failed on all of my wingmen´s hosts.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=64572665

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29921 - Posted: 13 Apr 2017, 5:23:19 UTC
Last modified: 13 Apr 2017, 5:24:35 UTC

I have had quite a number of such cases since yesterday morning.

For example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=134348645

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 29922 - Posted: 13 Apr 2017, 6:55:15 UTC

I got another one:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=134353416

- My wingman´s WU failed
- They fail on windows as well as on linux
- All of them seem to be from taskID=10995525

On the other hand I have a 10995525 WU currently running for more than 2.5 h and it may be that the server simply prefers that ID.

Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester
Send message
Joined: 15 Jul 05
Posts: 103
Credit: 793,847
RAC: 5,012
Message 29923 - Posted: 13 Apr 2017, 7:37:08 UTC

In the file server logs, we see a couple of ATLAS result files with 0 length, possibly related.

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,399,908
RAC: 3,711
Message 29924 - Posted: 13 Apr 2017, 8:42:18 UTC

Just got another 2 in a row that failed after a few seconds.
This one had David Cameron´s host as one of the wingmen.

A 3rd WU started successfully.

maeax
Send message
Joined: 2 May 07
Posts: 182
Credit: 11,301,914
RAC: 11,411
Message 29925 - Posted: 13 Apr 2017, 9:37:52 UTC
Last modified: 13 Apr 2017, 9:41:23 UTC

Some task finished after 3 or 4 min with computation error.
Tasks with this quick end have 2 or 3 days duration time at the beginning and
1.814.400 GigaFlops work (longrunner).
https://lhcathome.cern.ch/lhcathome/result.php?resultid=134341923

Tasks with 1 or two 2 hour duration time and 43.200 GigaFlops work are ok.

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29927 - Posted: 13 Apr 2017, 14:54:31 UTC

I have had quite a number of these failed tasks this afternoon (after having had some others yesterday and last night).
Most of them finish after 3-4 minutes, some after 10-11 minutes.

What's going wrong?

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29928 - Posted: 14 Apr 2017, 3:29:16 UTC

Last night, again I had several such failing WUs.

By now, it becomes quite annoying.

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29929 - Posted: 14 Apr 2017, 11:53:18 UTC

About 2 hours ago, I got the next 3 faulty WUs.
What is going on there?

Dave Peachey
Send message
Joined: 9 May 09
Posts: 17
Credit: 752,075
RAC: 14
Message 29930 - Posted: 14 Apr 2017, 18:57:43 UTC

I've had more than a dozen of them in the last 24 hours. I've worked on the assumption that anything with a very long predicted duration (i.e. significantly longer than what I experience for 'normal' WUs) is likely to be faulty - notwithstanding any legitimate 'ultra-long' WUs which are in circulation.

On which basis, I bump them to the top of the queue by temporarily suspending any other reasonable-looking WUs just to clear them out as quickly as possible. It's a laborious, manual intervention which I do once every eight hours or so but, thus far, I've been proven correct; anything showing a very long predicted duration has errored out within a few minutes.

Whilst that's not been too much of a waste of processing time overall (just over an hour all told) I have found that, with several of these predicted long WUs in the queue at any one time, that clogs up my BOINC queue to the extent that I can't download other WUs for other projects because BOINC Manager thinks the WU queue is full (I keep a max. 0.75-day long queue for the sake of avoiding too many problems if something drastic happens).

On which basis, yes, it's annoying and the sooner they are all errored out (assuming they won't/can't be removed manually), the better.

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29931 - Posted: 14 Apr 2017, 19:36:23 UTC - in response to Message 29930.

... assuming they won't/can't be removed manually...

why not?

Dave Peachey
Send message
Joined: 9 May 09
Posts: 17
Credit: 752,075
RAC: 14
Message 29932 - Posted: 14 Apr 2017, 19:45:20 UTC - in response to Message 29931.
Last modified: 14 Apr 2017, 19:54:36 UTC

... assuming they won't/can't be removed manually...

why not?

No reason that I know of other than the fact that, in spite of this issue having been extant for a couple of days, nothing has been done about it.

That's not to say it wouldn't be possible if there were the resources available and/or a willingness to do this ... albeit, in this respect, I actually know very little about the potential complexities of the matter so I may be completely wrong!

maeax
Send message
Joined: 2 May 07
Posts: 182
Credit: 11,301,914
RAC: 11,411
Message 29933 - Posted: 14 Apr 2017, 19:57:13 UTC - in response to Message 29925.

Some task finished after 3 or 4 min with computation error.
Tasks with this quick end have 2 or 3 days duration time at the beginning and
1.814.400 GigaFlops work (longrunner).


Boinc-Homepage https://boinc.berkeley.edu/download_all.php

have a pre-release Boinc 7.7.2 including Virtualbox 5.1.18.
This works for me now with this longrunners.

After Installation - don't forget to reboot your Computer.

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29934 - Posted: 16 Apr 2017, 5:46:52 UTC

the error now comes up with almost every work unit :-(

Why is no-one at CERN taking care of this Problem?

gyllic
Send message
Joined: 9 Dec 14
Posts: 71
Credit: 818,188
RAC: 5,260
Message 29935 - Posted: 16 Apr 2017, 7:56:11 UTC - in response to Message 29934.

Why is no-one at CERN taking care of this Problem?

maybe easter holidays

Profile PhilTheNet
Avatar
Send message
Joined: 21 Sep 14
Posts: 12
Credit: 105,691
RAC: 237
Message 29936 - Posted: 16 Apr 2017, 8:31:30 UTC - in response to Message 29918.

Same :

<error_code>-161 (not found)</error_code>
____________

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 24 Oct 04
Posts: 494
Credit: 14,295,892
RAC: 11,839
Message 29938 - Posted: 16 Apr 2017, 9:02:10 UTC - in response to Message 29936.

Same :

<error_code>-161 (not found)</error_code>


I'm surprised you got those 2 Valid tasks with this......


Required extension pack not installed, remote desktop not enabled.
____________
Volunteer Mad Scientist For Life

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 29939 - Posted: 16 Apr 2017, 9:42:47 UTC

The extension pack isn't needed for successful WU, just very helpful for troubleshooting.

Erich56
Send message
Joined: 18 Dec 15
Posts: 304
Credit: 3,437,579
RAC: 8,426
Message 29943 - Posted: 16 Apr 2017, 14:59:09 UTC - in response to Message 29935.

Why is no-one at CERN taking care of this Problem?

maybe easter holidays

well, you are probably right :-)

Meanwhile, I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well.

So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately.

1 · 2 · 3 · Next

Message boards : ATLAS application : Error -161