Message boards :
News :
File upload issues
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jul 05 Posts: 249 Credit: 5,974,599 RAC: 0 |
Our NFS storage backend got saturated and hence uploads are failing intermittently. The underlying cause is an issue with file deletion, we are trying to resolve that. Sorry for the trouble and thanks for your patience with transfers to LHC@home. |
Send message Joined: 22 Sep 04 Posts: 6 Credit: 842,556 RAC: 0 |
There also seems to be problems (possibly closely related) with Downloads. I have some downloads that have made 28 attempts without success. I have two uploads that have been retrying for approx 10 days without success. Thanks for looking into these issues - a nice thing to come back to after your break! |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,921,996 RAC: 33,985 |
What has changed since around noon is that the finished but still not uploaded ATLAS tasks do upload once in a while, with the progress bar going from 0% to 100%, then everything stops for a while, and finally the 100% value reverts to 0%. This, in fact, is exactly what we had some 3 weeks ago, when there was this big trouble with the ATLAS tasks. So whatever you were trying to fix so far - it didn't work (yet). |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,696,200 RAC: 99,193 |
Things seem better for me, my only stuck one are the ones that are pending locks to be released. |
Send message Joined: 28 Jul 05 Posts: 24 Credit: 6,603,623 RAC: 0 |
Some wu's are uploaded/downloaded over here, but still have one stuck on Windows. It's one of the first ones to get stuck: 9-1-2018 0:27:09 | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_260_BOINC_errors__59__s__62.31_60.32__5.5_5.6__5__84_1_sixvf_boinc207441_0_r348705869_0] locked by file_upload_handler PID=-1 Deadline 9-1-2018 9:40:54 Linux still showing some issues: locked by file_upload_handler PID=-1 |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,921,996 RAC: 33,985 |
my finished ATLAS tasks, when trying to upload, still show "transient HTTP error" - So, whatever the CERN people tried to fix yesterday - obviously without success :-( |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
A first action (cleanup of upload/download files) worked, and allowed to un-block the situation yesterday around noon. Still with hiccups, but, as a volunteer, I managed to upload my results and download new WUs, until ~23:00 PM GVA local time. I guess that the IT guys are planning a deeper intervention on the NFS storage backend (see post by Nils - https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4567&postid=33713) - let's wait for more news from their side |
Send message Joined: 24 Jun 14 Posts: 1 Credit: 1,260,244 RAC: 0 |
One of my servers is steadily but surely getting clogged by LHC tasks it can’t finish uploading, with 60+ upload tasks and just as many download, at the moment. Its seems that LHC_2015 tasks, while tedious, is slowly, very slowly, getting up- and downloaded, however workspace1_hl13 tasks is a dead end. They simply will not upload. I have tasks finished in December that is still not uploaded. |
Send message Joined: 15 Jul 05 Posts: 249 Credit: 5,974,599 RAC: 0 |
Thanks Lasse, that is useful information. These files were probably half-uploaded earlier, and should under normal circumstances be deleted on the server to allow a fresh upload. We also ran out of space again, and will stop uploads/downloads for a while today to add more disk space on the old NFS backend server. Thanks again to you all for your patience. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,336,437 RAC: 25,953 |
All servers are offline at the moment, see: https://lhcathome.cern.ch/lhcathome/server_status.php |
Send message Joined: 15 Jul 05 Posts: 249 Credit: 5,974,599 RAC: 0 |
The NFS server upgrade takes longer than expected as it is trying to clear pending file deletes. Sorry for this, uploads should resume later once the server status is back to green. |
Send message Joined: 15 Jul 05 Posts: 249 Credit: 5,974,599 RAC: 0 |
Our NFS server is finally up again, and hopefully with better performance. Transfers should resume again, at least my Sixtrack tasks uploaded correctly. Sorry again for the trouble with uploads caused by this. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,921,996 RAC: 33,985 |
also my 44kb Sixtrack file got uploaded okay. However, non of the finished ATLAS files succeed; the progress bar goes from 0% to 100% (with a few interruptions inbetween), then it sits at 100% for a short while, and later it reverts back to 0% and jumps to "retry". BOINC event log always shows "transient upload error" :-( Same what we hat around Mid-December. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,336,437 RAC: 25,953 |
My sixtrack and Atlas tasks have all been uploaded. Those that were over deadline (sixtrack) are now pending and waiting for wingmates. [edit]New tasks have also been downloaded and crunched.[/edit] |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
My host manage to upload half yesterday but now stuck again, as it got some task back it started to download new but as it looks now download got pending and take 2 min to download a task that have 30 sec runtime. Just wait it out and hopefully server would catch up. Edit: Got most task uploaded and download no longer have pending time. |
Send message Joined: 24 Sep 08 Posts: 4 Credit: 397,080 RAC: 0 |
I don't do ATLAS tasks, but my most recently completed sixTrack task is stuck like this: 10.01.2018 02:31:18 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0 10.01.2018 02:31:21 | LHC@home | [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0] locked by file_upload_handler PID=-1 10.01.2018 02:31:21 | LHC@home | Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0: transient upload error Is this file locking issue depending on NFS storage issues, or is it something on my end? |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,921,996 RAC: 33,985 |
transient upload errormost probably nothing on your end. I, too, got this "transient upload error" for several days, and finally last night my remaining ATLAS tasks were uploaded :-) |
Send message Joined: 24 Sep 08 Posts: 4 Credit: 397,080 RAC: 0 |
Strange thing is that this file lock issue only affect one of my tasks so far. After my previous post BOINC started another Sixtrack task and successfully uploaded it, while the one mentioned above still has problems. 10.01.2018 13:26:58 | LHC@home | Starting task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1 10.01.2018 14:17:34 | LHC@home | Computation for task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1 finished 10.01.2018 14:17:36 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0 10.01.2018 14:17:56 | LHC@home | Finished upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0 10.01.2018 14:17:57 | LHC@home | Sending scheduler request: To report completed tasks. 10.01.2018 14:17:57 | LHC@home | Reporting 1 completed tasks 10.01.2018 18:02:55 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0 10.01.2018 18:02:58 | LHC@home | [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0] locked by file_upload_handler PID=-1 10.01.2018 18:02:58 | LHC@home | Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0: transient upload error |
Send message Joined: 28 Apr 07 Posts: 1 Credit: 371,611 RAC: 0 |
Strange thing is that this file lock issue only affect one of my tasks so far. After my previous post BOINC started another Sixtrack task and successfully uploaded it, while the one mentioned above still has problems. I am experiencing the same issue. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,921,996 RAC: 33,985 |
from what I gather, "locked by file_upload_handler PID=-1" means that at the first upload try, the file was only partly uploaded, and no further upload attempt will be successful until the partly uploaded file is being removed. I remember to have read somewhere here that recently this "removal" tool was set to do it's job every 6 hours or so. |
©2024 CERN