Message boards :
ATLAS application :
Error -161
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Sep 08 Posts: 807 Credit: 652,011,186 RAC: 291,809 |
I've got a few tasks with this error. 2017-04-12 21:19:33 (13792): Guest Log: Starting ATLAS job. (PandaID=3326368081 taskID=10995525) 2017-04-12 21:21:23 (13792): Guest Log: Failed! Shutting down the machine. upload failure: <file_xfer_error> <file_name>FZ9LDm7tXIqnDDn7oo6G73TpABFKDmABFKDmPaIKDmaFFKDmJSlUln_0_ATLAS_result</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,160,751 RAC: 128,400 |
Most likely a faulty batch. I also have 1 of them and that WU failed on all of my wingmen´s hosts. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=64572665 |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
I have had quite a number of such cases since yesterday morning. For example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=134348645 |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,160,751 RAC: 128,400 |
I got another one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=134353416 - My wingman´s WU failed - They fail on windows as well as on linux - All of them seem to be from taskID=10995525 On the other hand I have a 10995525 WU currently running for more than 2.5 h and it may be that the server simply prefers that ID. |
Send message Joined: 15 Jul 05 Posts: 242 Credit: 5,800,306 RAC: 0 |
In the file server logs, we see a couple of ATLAS result files with 0 length, possibly related. |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,160,751 RAC: 128,400 |
Just got another 2 in a row that failed after a few seconds. This one had David Cameron´s host as one of the wingmen. A 3rd WU started successfully. |
Send message Joined: 2 May 07 Posts: 2097 Credit: 159,690,891 RAC: 144,248 |
Some task finished after 3 or 4 min with computation error. Tasks with this quick end have 2 or 3 days duration time at the beginning and 1.814.400 GigaFlops work (longrunner). https://lhcathome.cern.ch/lhcathome/result.php?resultid=134341923 Tasks with 1 or two 2 hour duration time and 43.200 GigaFlops work are ok. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
I have had quite a number of these failed tasks this afternoon (after having had some others yesterday and last night). Most of them finish after 3-4 minutes, some after 10-11 minutes. What's going wrong? |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
Last night, again I had several such failing WUs. By now, it becomes quite annoying. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
About 2 hours ago, I got the next 3 faulty WUs. What is going on there? |
Send message Joined: 9 May 09 Posts: 17 Credit: 772,975 RAC: 0 |
I've had more than a dozen of them in the last 24 hours. I've worked on the assumption that anything with a very long predicted duration (i.e. significantly longer than what I experience for 'normal' WUs) is likely to be faulty - notwithstanding any legitimate 'ultra-long' WUs which are in circulation. On which basis, I bump them to the top of the queue by temporarily suspending any other reasonable-looking WUs just to clear them out as quickly as possible. It's a laborious, manual intervention which I do once every eight hours or so but, thus far, I've been proven correct; anything showing a very long predicted duration has errored out within a few minutes. Whilst that's not been too much of a waste of processing time overall (just over an hour all told) I have found that, with several of these predicted long WUs in the queue at any one time, that clogs up my BOINC queue to the extent that I can't download other WUs for other projects because BOINC Manager thinks the WU queue is full (I keep a max. 0.75-day long queue for the sake of avoiding too many problems if something drastic happens). On which basis, yes, it's annoying and the sooner they are all errored out (assuming they won't/can't be removed manually), the better. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
... assuming they won't/can't be removed manually... why not? |
Send message Joined: 9 May 09 Posts: 17 Credit: 772,975 RAC: 0 |
... assuming they won't/can't be removed manually... No reason that I know of other than the fact that, in spite of this issue having been extant for a couple of days, nothing has been done about it. That's not to say it wouldn't be possible if there were the resources available and/or a willingness to do this ... albeit, in this respect, I actually know very little about the potential complexities of the matter so I may be completely wrong! |
Send message Joined: 2 May 07 Posts: 2097 Credit: 159,690,891 RAC: 144,248 |
Some task finished after 3 or 4 min with computation error. Boinc-Homepage https://boinc.berkeley.edu/download_all.php have a pre-release Boinc 7.7.2 including Virtualbox 5.1.18. This works for me now with this longrunners. After Installation - don't forget to reboot your Computer. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
the error now comes up with almost every work unit :-( Why is no-one at CERN taking care of this Problem? |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Why is no-one at CERN taking care of this Problem? maybe easter holidays |
Send message Joined: 21 Sep 14 Posts: 25 Credit: 723,818 RAC: 0 |
Same : <error_code>-161 (not found)</error_code> |
Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,748,124 RAC: 10,395 |
Same : I'm surprised you got those 2 Valid tasks with this...... Required extension pack not installed, remote desktop not enabled. Volunteer Mad Scientist For Life |
Send message Joined: 27 Sep 08 Posts: 807 Credit: 652,011,186 RAC: 291,809 |
The extension pack isn't needed for successful WU, just very helpful for troubleshooting. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,612,462 RAC: 120,497 |
Why is no-one at CERN taking care of this Problem? well, you are probably right :-) Meanwhile, I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well. So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately. |
©2024 CERN