Message boards :
News :
File upload issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,271,317 RAC: 28,705 |
I have two of these. They have now been in upload status for 67 hours both. The removal of partly uploaded files does not seem to work. They both have uploaded about 0.5 % and never proceed above that. These two are the only ones that I have currently with upload or download problems. Here's the log: 6547 LHC@home 12.1.2018 10:04:09 Started upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0 6548 LHC@home 12.1.2018 10:04:09 Started upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0 6549 LHC@home 12.1.2018 10:04:15 [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0] locked by file_upload_handler PID=-1 6550 LHC@home 12.1.2018 10:04:15 [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0] locked by file_upload_handler PID=-1 6551 LHC@home 12.1.2018 10:04:15 Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0: transient upload error 6552 LHC@home 12.1.2018 10:04:15 Backing off 04:23:48 on upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0 6553 LHC@home 12.1.2018 10:04:15 Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0: transient upload error 6554 LHC@home 12.1.2018 10:04:15 Backing off 04:33:03 on upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0 |
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,271,317 RAC: 28,705 |
I have now two new tasks (now Atlas tasks) locked by file_upload_handler PID=-1. They are on an other host than the two I reported on Friday. Tasks are here: https://www.cpdn.org/cpdnboinc/result.php?resultid=20921445 and here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=173295916 |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
LHC would probably not be able to help your cpdn task issue. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
... and the servers are overwhelmed again. Both uploads and downloads fail with transient http errors. |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
so it seems - I have problems downloading a couple of SixTrack tasks... transient HTTP errors, as you I am contacting the IT guys |
Send message Joined: 24 Oct 04 Posts: 1155 Credit: 52,305,052 RAC: 57,440 |
so it seems - I have problems downloading a couple of SixTrack tasks... transient HTTP errors, as you THANKS I have 32 cores running these and the problem just started for me 2 hours ago and I sure hope that problem doesn't happen again where lots of complete tasks turn into nothing and just get aborted. 1/14/2018 5:02:18 AM | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__8__s__62.31_60.32__7.9_8.0__5__28.5_1_sixvf_boinc28280_0_r1268262920_0: transient HTTP error Volunteer Mad Scientist For Life |
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,271,317 RAC: 28,705 |
LHC would probably not be able to help your cpdn task issue. DUH! I don't know how I managed to f*ck that up. Anyway more LHC tasks are on upload queue. Downloads seem to be coming thru better but not all on one go. |
Send message Joined: 24 Sep 08 Posts: 4 Credit: 397,080 RAC: 0 |
Now the one I reported as stuck earlier have timed out and shows up as error in my account because the server locked it with just over half a percent uploaded... :-( Please fix this locking issue on server side before lots of more tasks error out because of this and have to be sent out again! |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
ok upload/download seems to be back to functional - at least, my problems got automatically solved without any particular action from me |
Send message Joined: 18 Dec 15 Posts: 1742 Credit: 114,934,486 RAC: 93,948 |
ok upload/download seems to be back to functionalI can confirm :-) |
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,271,317 RAC: 28,705 |
My uploads are still stuck: 25615 LHC@home 15.1.2018 11:59:25 Started upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0 25616 LHC@home 15.1.2018 11:59:25 Started upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0 25617 LHC@home 15.1.2018 11:59:51 [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0] locked by file_upload_handler PID=-1 25618 LHC@home 15.1.2018 11:59:51 [error] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0] locked by file_upload_handler PID=-1 25619 LHC@home 15.1.2018 11:59:51 Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0: transient upload error 25620 LHC@home 15.1.2018 11:59:51 Backing off 04:26:11 on upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__6_8__5__45_1_sixvf_boinc4967_0_r902298660_0 25621 LHC@home 15.1.2018 11:59:51 Temporarily failed upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0: transient upload error 25622 LHC@home 15.1.2018 11:59:51 Backing off 03:24:11 on upload of workspace1_hl13_collision_scan_62.3250_60.3300_chrom_15_oct_-300_B4__57__s__62.31_60.32__10_12__5__82.5_1_sixvf_boinc4994_0_r1263781623_0 Upload is being retried automatically every five minutes by BoincTasks. Both tasks will expire later today. The crunch time for both was below 10 seconds so aborting the uploads would not be any major loss. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 198,139,853 RAC: 86,453 |
Same here, several WUs are stuck in upload: LHC_2015_LHC_2015_234_BOINC_errors__20__s__62.31_60.32__5.5_5.6__5__55.5_1_sixvf_boinc69362_0_r960578163_0 0,517 44,00 K 00:07:22 - 25:55:03 0,00 Kbps Upload pending (Retry in: 03:41:54), retried: 11 AHuW74 workspace1_hl13_collision_scan_62.3250_60.3100_chrom_15_oct_-300_B1__59__s__62.31_60.32__6_8__5__45_1_sixvf_boinc5143_1_r726992513_0 0,575 44,00 K 00:17:39 - 196:19:28 0,00 Kbps Upload pending (Project backoff: 00:09:31) DEV21 workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B4__24__s__62.31_60.32__8_10__5__30_1_sixvf_boinc2072_1_r145347505_0 0,577 44,00 K 00:11:00 - 179:08:13 0,00 Kbps Upload pending (Project backoff: 00:09:31) DEV21 workspace1_hl13_collision_scan_62.3250_60.3125_chrom_15_oct_-300_B4__24__s__62.31_60.32__10_12__5__15_1_sixvf_boinc2081_1_r1521857171_0 0,581 44,00 K 00:18:17 - 179:49:21 0,00 Kbps Upload pending (Project backoff: 00:09:31) DEV21 LHC_2015_LHC_2015_234_BOINC_errors__20__s__62.31_60.32__7.0_7.1__5__10.5_1_sixvf_boinc70217_1_r1739503837_0 0,519 44,00 K 00:14:48 - 27:02:47 0,00 Kbps Upload pending (Retry in: 05:05:20), retried: 12 PHuW72 Supporting BOINC, a great concept ! |
Send message Joined: 18 Dec 15 Posts: 1742 Credit: 114,934,486 RAC: 93,948 |
My assumption would be that the upload problems with Sixtrack have to do with these many tasks in the mills. From what I just saw, there are about 1,700 unsent ATLAS tasks - so we might end up with the same problems which came up a month ago. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Every time I have tried ATLAS recently, I have gotten burned. But since they now have a full crew back at CERN, I am trying again. I have one VirtualBox machine that does only CMS, LCHb and Theory. They work fine, with no upload issues. And any problems on ATLAS or Theory do not affect them. Yesterday I started a second machine without VirtualBox, set to receive only ATLAS (native) and Sixtrack. So far, so good. But we will see how long LHC can keep the servers running properly. EDIT: One ATLAS is stuck in download, and two SixTrack are stuck in upload. But I have enough work to keep going. |
Send message Joined: 15 Jul 05 Posts: 246 Credit: 5,974,599 RAC: 0 |
We obviously hit the limit of the current infrastructure with 10k ATLAS and 200k Sixtrack tasks. The NFS server is ok, but the upload-download servers are struggling, Sorry about this, please be patient (again). :-( |
Send message Joined: 18 Dec 15 Posts: 1742 Credit: 114,934,486 RAC: 93,948 |
We obviously hit the limit of the current infrastructure with 10k ATLAS and 200k Sixtrack tasks. The NFS server is ok, but the upload-download servers are strugglingone of my finished ATLAS tasks has unsuccessfully been trying to get uploaded since yesterday. It always shows "locked by upload handler ... transient upload error", regardless of how often (for sure several hundred times) I push the "retry now" button. Obviously this tool which was said to be run every 6 hours in order to delete partly uploaded files is not doing it's job. I am questioning what sense it makes to permanently fill the "unsent" queue with new ATLAS tasks as long as these severe transfer problems exist. |
Send message Joined: 1 May 07 Posts: 27 Credit: 2,336,954 RAC: 335 |
Same here with :- 17/01/2018 18:33:32 | LHC@home | Temporarily failed upload of Au0KDmLGRurnDDn7oo6G73TpABFKDmABFKDmQrFKDmABFKDmRzxOQn_0_r70336218_ATLAS_result: transient upload error I presume it will get up loaded at some stage...before its deadline. |
Send message Joined: 14 Jul 17 Posts: 7 Credit: 260,936 RAC: 0 |
Hi! I'm not sure if I have understood this issue correctly: Those "partly uploaded file", are they on my machine or on the server? Do I need to take any actions, or is the problem going to solve itself when the servers are less busy? I currently have half a dozen or so tasks that are stuck in uploading state, and they represent together several days of hard computing so I'd hate to have to abort them! :-( As I can see on the server stat page, there are several thousands of items in the tasks and WU's "waiting for deletion" queues, and a whopping 768973 tasks to send!! :-O Will this issue be solved by itself once they are crunched and validated? (hopefully before the deadlines expires) Kindest regards, Gunnar |
Send message Joined: 18 Dec 15 Posts: 1742 Credit: 114,934,486 RAC: 93,948 |
Do I need to take any actions, or is the problem going to solve itself when the servers are less busy?There is nothing you can do than waiting and hoping that the tasks will be uploaded before the expiration date. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
This answer needs clarification. Most tasks have a chance of being returned and validated with credit even after the deadline has passed. The first (or first two, depending on the quorum) results to be *returned* will receive credit, regardless of deadline. Therefore the answer should be that nothing can be done other than hoping they will upload before the minimum quorum has been reached. |
©2024 CERN