Message boards : News : File upload issues
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 171
Credit: 2,493,558
RAC: 6,420
Message 33713 - Posted: 8 Jan 2018, 9:25:05 UTC

Our NFS storage backend got saturated and hence uploads are failing intermittently.

The underlying cause is an issue with file deletion, we are trying to resolve that.

Sorry for the trouble and thanks for your patience with transfers to LHC@home.
ID: 33713 · Report as offensive     Reply Quote
Peter Ingham

Send message
Joined: 22 Sep 04
Posts: 6
Credit: 599,080
RAC: 2
Message 33720 - Posted: 8 Jan 2018, 11:29:30 UTC - in response to Message 33713.  

There also seems to be problems (possibly closely related) with Downloads.

I have some downloads that have made 28 attempts without success.

I have two uploads that have been retrying for approx 10 days without success.

Thanks for looking into these issues - a nice thing to come back to after your break!
ID: 33720 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 829
Credit: 6,276,310
RAC: 17,770
Message 33727 - Posted: 8 Jan 2018, 16:47:50 UTC

What has changed since around noon is that the finished but still not uploaded ATLAS tasks do upload once in a while, with the progress bar going from 0% to 100%, then everything stops for a while, and finally the 100% value reverts to 0%.

This, in fact, is exactly what we had some 3 weeks ago, when there was this big trouble with the ATLAS tasks.

So whatever you were trying to fix so far - it didn't work (yet).
ID: 33727 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 465
Credit: 135,266,850
RAC: 293,752
Message 33729 - Posted: 8 Jan 2018, 21:29:48 UTC

Things seem better for me, my only stuck one are the ones that are pending locks to be released.
ID: 33729 · Report as offensive     Reply Quote
Empie

Send message
Joined: 28 Jul 05
Posts: 24
Credit: 2,365,595
RAC: 0
Message 33732 - Posted: 8 Jan 2018, 23:30:50 UTC
Last modified: 8 Jan 2018, 23:32:45 UTC

Some wu's are uploaded/downloaded over here, but still have one stuck on Windows. It's one of the first ones to get stuck:
9-1-2018 0:27:09 | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_260_BOINC_errors__59__s__62.31_60.32__5.5_5.6__5__84_1_sixvf_boinc207441_0_r348705869_0] locked by file_upload_handler PID=-1

Deadline 9-1-2018 9:40:54

Linux still showing some issues: locked by file_upload_handler PID=-1
ID: 33732 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 829
Credit: 6,276,310
RAC: 17,770
Message 33735 - Posted: 9 Jan 2018, 5:58:07 UTC - in response to Message 33732.  

my finished ATLAS tasks, when trying to upload, still show "transient HTTP error" - So, whatever the CERN people tried to fix yesterday - obviously without success :-(
ID: 33735 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 87
Credit: 1,144,167
RAC: 225
Message 33736 - Posted: 9 Jan 2018, 6:46:01 UTC - in response to Message 33735.  

A first action (cleanup of upload/download files) worked, and allowed to un-block the situation yesterday around noon. Still with hiccups, but, as a volunteer, I managed to upload my results and download new WUs, until ~23:00 PM GVA local time.

I guess that the IT guys are planning a deeper intervention on the NFS storage backend (see post by Nils - https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4567&postid=33713) - let's wait for more news from their side
ID: 33736 · Report as offensive     Reply Quote
Lasse Hintze Tøndering-Jensen

Send message
Joined: 24 Jun 14
Posts: 1
Credit: 1,260,239
RAC: 43
Message 33739 - Posted: 9 Jan 2018, 7:17:52 UTC

One of my servers is steadily but surely getting clogged by LHC tasks it can’t finish uploading, with 60+ upload tasks and just as many download, at the moment. Its seems that LHC_2015 tasks, while tedious, is slowly, very slowly, getting up- and downloaded, however workspace1_hl13 tasks is a dead end. They simply will not upload. I have tasks finished in December that is still not uploaded.
ID: 33739 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 171
Credit: 2,493,558
RAC: 6,420
Message 33740 - Posted: 9 Jan 2018, 7:41:19 UTC - in response to Message 33739.  

Thanks Lasse, that is useful information. These files were probably half-uploaded earlier, and should under normal circumstances be deleted on the server to allow a fresh upload.

We also ran out of space again, and will stop uploads/downloads for a while today to add more disk space on the old NFS backend server. Thanks again to you all for your patience.
ID: 33740 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 296
Credit: 10,335,418
RAC: 17,496
Message 33742 - Posted: 9 Jan 2018, 8:15:14 UTC

All servers are offline at the moment, see: https://lhcathome.cern.ch/lhcathome/server_status.php
ID: 33742 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 171
Credit: 2,493,558
RAC: 6,420
Message 33747 - Posted: 9 Jan 2018, 12:55:45 UTC

The NFS server upgrade takes longer than expected as it is trying to clear pending file deletes. Sorry for this, uploads should resume later once the server status is back to green.
ID: 33747 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 171
Credit: 2,493,558
RAC: 6,420
Message 33752 - Posted: 9 Jan 2018, 17:13:15 UTC - in response to Message 33747.  

Our NFS server is finally up again, and hopefully with better performance. Transfers should resume again, at least my Sixtrack tasks uploaded correctly.

Sorry again for the trouble with uploads caused by this.
ID: 33752 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 829
Credit: 6,276,310
RAC: 17,770
Message 33753 - Posted: 9 Jan 2018, 17:36:20 UTC - in response to Message 33752.  
Last modified: 9 Jan 2018, 17:36:39 UTC

also my 44kb Sixtrack file got uploaded okay.

However, non of the finished ATLAS files succeed; the progress bar goes from 0% to 100% (with a few interruptions inbetween), then it sits at 100% for a short while, and later it reverts back to 0% and jumps to "retry".
BOINC event log always shows "transient upload error" :-(
Same what we hat around Mid-December.
ID: 33753 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 296
Credit: 10,335,418
RAC: 17,496
Message 33754 - Posted: 9 Jan 2018, 18:47:10 UTC
Last modified: 9 Jan 2018, 18:48:55 UTC

My sixtrack and Atlas tasks have all been uploaded. Those that were over deadline (sixtrack) are now pending and waiting for wingmates.
[edit]New tasks have also been downloaded and crunched.[/edit]
ID: 33754 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 8
Credit: 129,871,746
RAC: 946,685
Message 33755 - Posted: 9 Jan 2018, 19:33:40 UTC
Last modified: 9 Jan 2018, 20:27:37 UTC

My host manage to upload half yesterday but now stuck again, as it got some task back it started to download new but as it looks now download got pending and take 2 min to download a task that have 30 sec runtime.

Just wait it out and hopefully server would catch up.

Edit: Got most task uploaded and download no longer have pending time.
ID: 33755 · Report as offensive     Reply Quote
Lars Vindal

Send message
Joined: 24 Sep 08
Posts: 4
Credit: 372,147
RAC: 9
Message 33759 - Posted: 10 Jan 2018, 1:44:45 UTC

I don't do ATLAS tasks, but my most recently completed sixTrack task is stuck like this:

10.01.2018 02:31:18 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0
10.01.2018 02:31:21 | LHC@home | [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0] locked by file_upload_handler PID=-1
10.01.2018 02:31:21 | LHC@home | Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0: transient upload error

Is this file locking issue depending on NFS storage issues, or is it something on my end?
ID: 33759 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 829
Credit: 6,276,310
RAC: 17,770
Message 33760 - Posted: 10 Jan 2018, 6:00:45 UTC - in response to Message 33759.  

transient upload error

Is this file locking issue depending on NFS storage issues, or is it something on my end?
most probably nothing on your end.
I, too, got this "transient upload error" for several days, and finally last night my remaining ATLAS tasks were uploaded :-)
ID: 33760 · Report as offensive     Reply Quote
Lars Vindal

Send message
Joined: 24 Sep 08
Posts: 4
Credit: 372,147
RAC: 9
Message 33773 - Posted: 10 Jan 2018, 17:17:48 UTC - in response to Message 33760.  

Strange thing is that this file lock issue only affect one of my tasks so far. After my previous post BOINC started another Sixtrack task and successfully uploaded it, while the one mentioned above still has problems.

10.01.2018 13:26:58 | LHC@home | Starting task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1
10.01.2018 14:17:34 | LHC@home | Computation for task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1 finished
10.01.2018 14:17:36 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0
10.01.2018 14:17:56 | LHC@home | Finished upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0
10.01.2018 14:17:57 | LHC@home | Sending scheduler request: To report completed tasks.
10.01.2018 14:17:57 | LHC@home | Reporting 1 completed tasks


10.01.2018 18:02:55 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0
10.01.2018 18:02:58 | LHC@home | [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0] locked by file_upload_handler PID=-1
10.01.2018 18:02:58 | LHC@home | Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0: transient upload error
ID: 33773 · Report as offensive     Reply Quote
Saharak

Send message
Joined: 28 Apr 07
Posts: 1
Credit: 103,843
RAC: 0
Message 33788 - Posted: 12 Jan 2018, 5:00:58 UTC - in response to Message 33773.  

Strange thing is that this file lock issue only affect one of my tasks so far. After my previous post BOINC started another Sixtrack task and successfully uploaded it, while the one mentioned above still has problems.

10.01.2018 13:26:58 | LHC@home | Starting task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1
10.01.2018 14:17:34 | LHC@home | Computation for task workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1 finished
10.01.2018 14:17:36 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0
10.01.2018 14:17:56 | LHC@home | Finished upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__6_8__5__15_1_sixvf_boinc1883_1_r821681728_0
10.01.2018 14:17:57 | LHC@home | Sending scheduler request: To report completed tasks.
10.01.2018 14:17:57 | LHC@home | Reporting 1 completed tasks


10.01.2018 18:02:55 | LHC@home | Started upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0
10.01.2018 18:02:58 | LHC@home | [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0] locked by file_upload_handler PID=-1
10.01.2018 18:02:58 | LHC@home | Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B1__22__s__62.31_60.32__4_6__5__45_1_sixvf_boinc1876_1_r612308196_0: transient upload error

I am experiencing the same issue.
ID: 33788 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 829
Credit: 6,276,310
RAC: 17,770
Message 33789 - Posted: 12 Jan 2018, 5:55:50 UTC

from what I gather, "locked by file_upload_handler PID=-1" means that at the first upload try, the file was only partly uploaded, and no further upload attempt will be successful until the partly uploaded file is being removed.
I remember to have read somewhere here that recently this "removal" tool was set to do it's job every 6 hours or so.
ID: 33789 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : File upload issues


©2018 CERN