Message boards : ATLAS application : Error -161
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 425
Credit: 118,004,153
RAC: 163,481
Message 29918 - Posted: 12 Apr 2017, 20:06:01 UTC

I've got a few tasks with this error.

2017-04-12 21:19:33 (13792): Guest Log: Starting ATLAS job. (PandaID=3326368081 taskID=10995525)
2017-04-12 21:21:23 (13792): Guest Log: Failed! Shutting down the machine.

upload failure: <file_xfer_error>
<file_name>FZ9LDm7tXIqnDDn7oo6G73TpABFKDmABFKDmPaIKDmaFFKDmJSlUln_0_ATLAS_result</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
ID: 29918 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 601
Credit: 6,439,033
RAC: 16,042
Message 29920 - Posted: 12 Apr 2017, 20:45:29 UTC - in response to Message 29918.  

Most likely a faulty batch.
I also have 1 of them and that WU failed on all of my wingmen´s hosts.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=64572665
ID: 29920 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29921 - Posted: 13 Apr 2017, 5:23:19 UTC
Last modified: 13 Apr 2017, 5:24:35 UTC

I have had quite a number of such cases since yesterday morning.

For example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=134348645
ID: 29921 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 601
Credit: 6,439,033
RAC: 16,042
Message 29922 - Posted: 13 Apr 2017, 6:55:15 UTC

I got another one:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=134353416

- My wingman´s WU failed
- They fail on windows as well as on linux
- All of them seem to be from taskID=10995525

On the other hand I have a 10995525 WU currently running for more than 2.5 h and it may be that the server simply prefers that ID.
ID: 29922 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 166
Credit: 2,037,172
RAC: 4,046
Message 29923 - Posted: 13 Apr 2017, 7:37:08 UTC

In the file server logs, we see a couple of ATLAS result files with 0 length, possibly related.
ID: 29923 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 601
Credit: 6,439,033
RAC: 16,042
Message 29924 - Posted: 13 Apr 2017, 8:42:18 UTC

Just got another 2 in a row that failed after a few seconds.
This one had David Cameron´s host as one of the wingmen.

A 3rd WU started successfully.
ID: 29924 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 401
Credit: 13,651,559
RAC: 21,077
Message 29925 - Posted: 13 Apr 2017, 9:37:52 UTC
Last modified: 13 Apr 2017, 9:41:23 UTC

Some task finished after 3 or 4 min with computation error.
Tasks with this quick end have 2 or 3 days duration time at the beginning and
1.814.400 GigaFlops work (longrunner).
https://lhcathome.cern.ch/lhcathome/result.php?resultid=134341923

Tasks with 1 or two 2 hour duration time and 43.200 GigaFlops work are ok.
ID: 29925 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29927 - Posted: 13 Apr 2017, 14:54:31 UTC

I have had quite a number of these failed tasks this afternoon (after having had some others yesterday and last night).
Most of them finish after 3-4 minutes, some after 10-11 minutes.

What's going wrong?
ID: 29927 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29928 - Posted: 14 Apr 2017, 3:29:16 UTC

Last night, again I had several such failing WUs.

By now, it becomes quite annoying.
ID: 29928 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29929 - Posted: 14 Apr 2017, 11:53:18 UTC

About 2 hours ago, I got the next 3 faulty WUs.
What is going on there?
ID: 29929 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 9 May 09
Posts: 17
Credit: 752,075
RAC: 0
Message 29930 - Posted: 14 Apr 2017, 18:57:43 UTC

I've had more than a dozen of them in the last 24 hours. I've worked on the assumption that anything with a very long predicted duration (i.e. significantly longer than what I experience for 'normal' WUs) is likely to be faulty - notwithstanding any legitimate 'ultra-long' WUs which are in circulation.

On which basis, I bump them to the top of the queue by temporarily suspending any other reasonable-looking WUs just to clear them out as quickly as possible. It's a laborious, manual intervention which I do once every eight hours or so but, thus far, I've been proven correct; anything showing a very long predicted duration has errored out within a few minutes.

Whilst that's not been too much of a waste of processing time overall (just over an hour all told) I have found that, with several of these predicted long WUs in the queue at any one time, that clogs up my BOINC queue to the extent that I can't download other WUs for other projects because BOINC Manager thinks the WU queue is full (I keep a max. 0.75-day long queue for the sake of avoiding too many problems if something drastic happens).

On which basis, yes, it's annoying and the sooner they are all errored out (assuming they won't/can't be removed manually), the better.
ID: 29930 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29931 - Posted: 14 Apr 2017, 19:36:23 UTC - in response to Message 29930.  

... assuming they won't/can't be removed manually...

why not?
ID: 29931 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 9 May 09
Posts: 17
Credit: 752,075
RAC: 0
Message 29932 - Posted: 14 Apr 2017, 19:45:20 UTC - in response to Message 29931.  
Last modified: 14 Apr 2017, 19:54:36 UTC

... assuming they won't/can't be removed manually...

why not?

No reason that I know of other than the fact that, in spite of this issue having been extant for a couple of days, nothing has been done about it.

That's not to say it wouldn't be possible if there were the resources available and/or a willingness to do this ... albeit, in this respect, I actually know very little about the potential complexities of the matter so I may be completely wrong!
ID: 29932 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 401
Credit: 13,651,559
RAC: 21,077
Message 29933 - Posted: 14 Apr 2017, 19:57:13 UTC - in response to Message 29925.  

Some task finished after 3 or 4 min with computation error.
Tasks with this quick end have 2 or 3 days duration time at the beginning and
1.814.400 GigaFlops work (longrunner).


Boinc-Homepage https://boinc.berkeley.edu/download_all.php

have a pre-release Boinc 7.7.2 including Virtualbox 5.1.18.
This works for me now with this longrunners.

After Installation - don't forget to reboot your Computer.
ID: 29933 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29934 - Posted: 16 Apr 2017, 5:46:52 UTC

the error now comes up with almost every work unit :-(

Why is no-one at CERN taking care of this Problem?
ID: 29934 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 135
Credit: 1,629,422
RAC: 2,978
Message 29935 - Posted: 16 Apr 2017, 7:56:11 UTC - in response to Message 29934.  

Why is no-one at CERN taking care of this Problem?

maybe easter holidays
ID: 29935 · Report as offensive     Reply Quote
Profile PhilTheNet
Avatar

Send message
Joined: 21 Sep 14
Posts: 12
Credit: 110,163
RAC: 0
Message 29936 - Posted: 16 Apr 2017, 8:31:30 UTC - in response to Message 29918.  

Same :

<error_code>-161 (not found)</error_code>
ID: 29936 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 628
Credit: 17,768,595
RAC: 18,259
Message 29938 - Posted: 16 Apr 2017, 9:02:10 UTC - in response to Message 29936.  

Same :

<error_code>-161 (not found)</error_code>


I'm surprised you got those 2 Valid tasks with this......


Required extension pack not installed, remote desktop not enabled.
Volunteer Mad Scientist For Life
ID: 29938 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 425
Credit: 118,004,153
RAC: 163,481
Message 29939 - Posted: 16 Apr 2017, 9:42:47 UTC

The extension pack isn't needed for successful WU, just very helpful for troubleshooting.
ID: 29939 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 754
Credit: 5,254,722
RAC: 8,126
Message 29943 - Posted: 16 Apr 2017, 14:59:09 UTC - in response to Message 29935.  

Why is no-one at CERN taking care of this Problem?

maybe easter holidays

well, you are probably right :-)

Meanwhile, I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well.

So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately.
ID: 29943 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : ATLAS application : Error -161


©2018 CERN