Message boards :
ATLAS application :
Uploads of finished tasks not possible since last night
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
... Last retry went to 100% but still failed with transient HTTP error...this is the same sort of problem we experienced last week. So maybe the root cause for the problem is back :-( |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I had one that was stuck since yesterday, but just uploaded successfully after a manual retry. I am setting no new ATLAS work until next year. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
You are probably right, Erich , the solution found (deleting partial uploads with script every 6 hours) is temporary , untill the use of the new file systems for nfs server. But "maybe" there is another way to wait for this update. Processes et daemons inside the boinc server have different priorities for their execution. Under heavy load ,the partial uploads occur when the "handler of upload" stops one upload because another process with a higher priority or a same priority is running , creating a conflict which perturbs the upload and stopping it , finally,before its normal end. (I don't speak about isp failure or client computer crash which are external causes.) "Maybe" , to attenuate the problem , it would be worth giving
a higher priority to the deleter face to the transitionner (the most cpu intensive) (in order to clean and bring more space) , and a lower priority to the feeder and why not also to the scheduler .
|
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
Same issue again, but not cause server disk full. Meanwhile 6 upload retries 132MB loading up to 100% : LHC@home 5PMNDmzJJornDDn7oo6G73TpABFKDmABFKDmSWJKDmABFKDmPiD4km_0_r672436546_ATLAS_result Progress 76.347% Size 135214,91 K Speed 1485,77 Kbps Uploading and then: LHC@home 27 Dec 15:49:01 Temporarily failed upload of 5PMNDmzJJornDDn7oo6G73TpABFKDmABFKDmSWJKDmABFKDmPiD4km_0_r672436546_ATLAS_result: transient upload error |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
this is the message BOINC gives me when I (re)try to upload finished ATLAS tasks: 27/12/2017 16:40:57 | LHC@home | [error] Error reported by file upload server: Server is out of disk space |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
and now got the following error message: 27/12/2017 18:38:52 | LHC@home | [error] Error reported by file upload server: [eo6KDm3Q2nrnSu7Ccp2YYBZmABFKDmABFKDmWWIKDmABFKDmIIGYKo_0_r2063197282_ATLAS_result] locked by file_upload_handler PID=-1 seems like the server can't decide what it's problem is :-) |
Send message Joined: 27 Aug 17 Posts: 1 Credit: 156,031 RAC: 0 |
I think I have the same problem. Several times my ATLAS run tried to upload got 100% .... and restarted. On a manual start I 've seen a slow start at 350kbs and then an immediate jump to 20% load done - though it got 100% (161MB) at the end but ended also in restart in n hours. That sounds strange in my oppinion. I think it's not a network problem -more a matter of accepting and acknowledging the task completed. best regards |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
After several more retries (not manual, but let BOINC do what it should), the upload succeeded. Before the success I meanwhile also got the message: LHC@home 27 Dec 17:04:55 [error] Error reported by file upload server: Server is out of disk space |
Send message Joined: 15 Jul 05 Posts: 248 Credit: 5,974,599 RAC: 0 |
Our storage space for uploads has been increased, but as there are many tasks queued, there might be temporary issues again. Sorry for this, and thanks for you contributions! |
Send message Joined: 28 Dec 08 Posts: 339 Credit: 4,863,589 RAC: 282 |
Still jammed up..I got an upload to 100% and it stalled and then restarted and can't upload now. Shutting down for the night, see what changes in 7 hrs. |
Send message Joined: 1 May 07 Posts: 27 Credit: 2,336,992 RAC: 1 |
Seems to be stuck again.. 28/12/2017 11:42:11 | LHC@home | [error] Error reported by file upload server: [0ZbMDmof9nrnDDn7oo6G73TpABFKDmABFKDmxLFKDmABFKDmtodCCn_0_r1908771280_ATLAS_result] locked by file_upload_handler PID=-1 |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
Seems to be stuck again.. same thing here - a task which got finished several days ago can't upload: " locked by file_upload_handler PID=-1" another task which got finished during last night was uploaded right away. About 2-3 weeks ago, when there were these big problems caused by too many ATLAS tasks in the mills (thus straining too much the infrastructure there), David Cameron put into effect a tool which was intended to clean up partial uploads every 6 hours; hence, I am surprised that now, with a considerabely lower number of tasks in the mills (only about one third compared to before), there is still the "locked by file_upload_handler" problem. I am wondering if there is another problem now :-( |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
same thing here - a task which got finished several days ago can't upload: " locked by file_upload_handler PID=-1"just would like to report that this task was finally uploaded :-) |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
I cancelled two hung uploads yesterday-ish. Very short run time, not much lost. I'd like to think it helped you return your results. :) |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
I'd like to think it helped you return your results. :)haha, many thanks :-) |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Again server problems? Yesturday i had the " locked by file_upload_handler PID=-1" error (the results are uploaded by now) and today i have the "transient http error": 14.01.2018 11:41:42 | LHC@home | Starting task AIKLDm7iuurnDDn7oo6G73TpABFKDmABFKDmZPHKDmABFKDm6jOAWn_0 14.01.2018 14:13:45 | LHC@home | Computation for task AIKLDm7iuurnDDn7oo6G73TpABFKDmABFKDmZPHKDmABFKDm6jOAWn_0 finished 14.01.2018 14:13:48 | LHC@home | Started upload of AIKLDm7iuurnDDn7oo6G73TpABFKDmABFKDmZPHKDmABFKDm6jOAWn_0_r1909380871_ATLAS_result 14.01.2018 14:15:04 | | Project communication failed: attempting access to reference site 14.01.2018 14:15:04 | LHC@home | Temporarily failed upload of AIKLDm7iuurnDDn7oo6G73TpABFKDmABFKDmZPHKDmABFKDm6jOAWn_0_r1909380871_ATLAS_result: transient HTTP error 14.01.2018 14:15:04 | LHC@home | Backing off 00:02:34 on upload of AIKLDm7iuurnDDn7oo6G73TpABFKDmABFKDmZPHKDmABFKDm6jOAWn_0_r1909380871_ATLAS_result 14.01.2018 14:15:08 | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 18 Sep 04 Posts: 30 Credit: 5,100,929 RAC: 0 |
I have an ATLAS task not uploading since many, many days: 16.01.2018 08:55:48 | LHC@home | Started upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result 16.01.2018 09:00:55 | LHC@home | Temporarily failed upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result: transient HTTP error 16.01.2018 09:00:55 | LHC@home | Backing off 04:15:14 on upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result 16.01.2018 10:12:25 | LHC@home | Started upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result 16.01.2018 10:12:47 | LHC@home | Temporarily failed upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result: connect() failed 16.01.2018 10:12:47 | LHC@home | Backing off 03:54:55 on upload of WqtNDme5DvrnDDn7oo6G73TpABFKDmABFKDm8sHKDmABFKDmT56S3n_0_r205511374_ATLAS_result Strangely, the ATLAS task data listed in my account is not consistent with the data displayed in my client: While tha task date is identical, download and due dates differ. The task was neither delivered on 15th of January by the server (instead many days earlier) nor has it to be complete on 23rd of January (but on 22nd). Do you have a database problem? Michael. |
Send message Joined: 28 Sep 04 Posts: 728 Credit: 49,050,975 RAC: 27,146 |
The due dates (deadline) differ by one day for LHC tasks. Boinc manager says that due date is one day earlier than server. I have never seen an explanation why, but it has been like this for years. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
For quite a while now, ATLAS uploads fail with "server out of disk space". |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
the error notices seem to change from time to time: since last night, it always says "locked by upload handler" and "transient upload error" - the same what we had from Mid-December on most of the time. Meanwhile, the number of "unsent" ATLAS tasks on the Project Status Page is "0" - which is best they can do, anyway. I think it does not make any sense to send out ATLAS tasks for crunching as long as all these severe file transfer (and other) problems persist. |
©2024 CERN