Message boards :
Sixtrack Application :
Transfer issues
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
I think that the IT guys have run some cleaning scripts - I will trigger them again. Thanks for monitoring! Cheers, A. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
Not sure if this is related, but I was able to upload two stuck results: one yesterday and one today. Thank you! |
Send message Joined: 24 Oct 04 Posts: 1172 Credit: 54,685,889 RAC: 15,649 |
I am still having this problem on several of my 8-core The one I am on has 3 tasks stuck on the return page for several days and another task just finished and it was sent right up but these other 3 won't leave and one has been there so long it is now past due and the other two are due on the 25th and I have been trying to send them manually over and over every time I check each computer. Volunteer Mad Scientist For Life |
Send message Joined: 14 Jul 17 Posts: 7 Credit: 260,936 RAC: 0 |
Hi! I currently have 6 or 7 tasks stuck in uploading on some of my computers. At this one that I'm using right now there are the following three: Sat 20 Jan 2018 06:43:13 PM CET | LHC@home | Started upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0 Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | [error] Error reported by file upload server: [w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0] locked by file_upload_handler PID=-1 Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Temporarily failed upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0: transient upload error Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Backing off 04:49:27 on upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0 Sat 20 Jan 2018 06:43:29 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0 Sat 20 Jan 2018 06:43:46 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0 Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0] locked by file_upload_handler PID=-1 Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0: transient upload error Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Backing off 03:09:52 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0 Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0] locked by file_upload_handler PID=-1 Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0: transient upload error Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Backing off 05:53:38 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0 I've heard something about some file fragments (on the server) and if that is the case, maybe they can be removed by a command like: > find /correct/path/ -type f -name "*sixvf*" -size +220c -size -250c -mtime -20 -exec rm -f {} \; (if the files on the server gets the same filenames as on my client computer, or else the -name parameter needs to be changed) This command would clean out files that are between 220 and 250 bytes, which at least seams to be their actual sizes, judging from how much is reported to have been uploaded before it stopped (between 0.46 % and 0.51 % of files of the size of aprox. 44kB.) Hope that you can resolve the issue before the tasks are running out of time. Have a nice day!!! Kindest regards, Gunnar |
Send message Joined: 24 Oct 04 Posts: 1172 Credit: 54,685,889 RAC: 15,649 |
I have had 3 stuck on one of mine for days so I decided to check this pc's stats page under the Error list and there was one of them there saying it is Timed out - no response which means that finished task that was ready to send a couple days before the due date is now Invalid since the server refused to take it back (it was taking other finished tasks) so I am just going to abort this one and keep trying to send in the other 2 finished tasks when I am on this one since they still have 4 days left on that dues date. With 50 cores running these I sure spend a lot of time checking them all and trying to make sure most of them get returned. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10451775 Task 172348631 83568818 12 Jan 2018, 2:43:21 UTC 19 Jan 2018, 18:15:35 UTC Timed out - no response After aborting it just now it disappeared from the ERROR list and is now gone completely from my account on ALL settings Edit: ok now checking the next 8-core here I see I have 6 of those AVX tasks that refuse to hit the road and they are those 10 second to 1 minute versions. One thing that would help with this is if someone at the server would come here and say that we should just abort them and send them back that way so we don't end up with an even longer list of unreturned tasks.(at least with maybe the past due ones that in reality are finished on time) (ok time to check the rest of them) |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
One thing that would help with this is if someone at the server would come here and say that we should just abort them and send them back that way so we don't end up with an even longer list of unreturned tasks.(at least with maybe the past due ones that in reality are finished on time) +1 |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 62,755 |
Fr 26 Jan 2018 15:17:40 CET | LHC@home | Temporarily failed download of w-c3_job.B1inj_c3.2158__52__s__64.28_59.31__5.1_6.1__6__78_1_sixvf_boinc54488.zip: transient HTTP error I see a couple of those messages in the log this afternoon. Maybe the servers need some friendly words. |
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,212,240 RAC: 27,001 |
Although about 770.000 "unsent" tasks are shown on the Project Status Page, I have been unsuccessfully trying to download tasks for hours now. It always says "No tasks are available for Sixtrack" :-( Will this problem ever be solved? |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
I've run out of work this morning. My system has been asigned new tasks, but the download is delayed again and again. Not worth it, shutting down for the weekend. |
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,212,240 RAC: 27,001 |
I've run out of work this morning. My system has been asigned new tasks, but the download is delayed again and again.and again, as already with ATLAS many times, I am questioning what sense it makes to pump several hunded thousands new tasks into the mills if not even their downloads work, not to talk about the upload problems. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Although about 770.000 "unsent" tasks are shown on the Project Status Page, I have been unsuccessfully trying to download tasks for hours now. OK, so it's not just me. Good to know. All the Theory WU's use so much internet bandwidth that I can't even watch a video. I knew there were no Sixtrack coming down because I couldn't watch Democracy Now! this morning. |
Send message Joined: 17 Feb 07 Posts: 86 Credit: 968,855 RAC: 0 |
The uploading and downloading of WU's is still not going smooth. Sometimes it takes a few hours for only one to upload while the others go quick. But with patience all is going well. Greetings from, TJ |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
While there is still, clearly, some issue with the servers, would it perhaps be sensible to (briefly) suspend the issue of NEW work to allow the servers to catch up with resends? I have 5 that are Validation Pending for over a week but have yet to be sent to a second or third wingman but as these seem to go to the back of the queue, they are taking up vital space on the server for longer than necessary, rather than being send out quickly and allowed to clear. Surely allowing the resend backlog to clear would free up space and then allow a clearer run for the new work. [I've said "clearly", "clear" and "clearer" a few more times than I had intended but hopefully I've made my point clear 8¬) ] |
Send message Joined: 24 Oct 04 Posts: 1172 Credit: 54,685,889 RAC: 15,649 |
Clearest? It has been a bit slow but at least this time if I tell it to transfer several times and Boinc update it happens instead of staying there beyond the due date........of course I could also just be getting lucky. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
Long validation times don't bother me personally. But it would be prudent to control job creation by staggering job creation. AFAIK over-filling the queue with large batches does not speed up production. Inf act, at work we try to achieve the opposite in accordance with Best Practice, Lean Management principles. Can job creation be initiated automatically and staggered in smaller batches? Could job creation be rescheduled so staff is available to deal with hiccups? |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
Hi, For what concerns upload/download issues, we are in the hands of the IT experts. I fear that the large result files of ATLAS tasks play a role on that... Concerning the big delays in validation, this is due to the fact that there are a lot of queueing tasks - I am wondering about how complicated is to re-issue validation-failing tasks in higher priority. Concerning managing the load of SixTrack tasks, I agree that having large batches does not speed up production. The only real drawbacks are: - long times in validating results, in case of need of re-issuing; - long times in releasing new versions of exes. Thanks for the feedback - I will discuss this with IT. |
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,212,240 RAC: 27,001 |
While there is still, clearly, some issue with the servers, would it perhaps be sensible to (briefly) suspend the issue of NEW work to allow the servers to catch up with resends? I have 5 that are Validation Pending for over a week but have yet to be sent to a second or third wingman but as these seem to go to the back of the queue, they are taking up vital space on the server for longer than necessary, rather than being send out quickly and allowed to clear. Surely allowing the resend backlog to clear would free up space and then allow a clearer run for the new work.This is exactly what I have been saying in several postings here in the various message boards, after the whole mess started with far too many ATLAS tasks Mid-December. Because I simply don't understand what sense it makes to pump even more tasks into the mills while, on the other hand, neither the servers nor the network can handle these huge bulks of data. So, I'm once more rather surprised to see that this morning the number of unsent Sixtrack tasks has passed the million (and still is growing). But, perhaps my thinking is totally wrong and one of the mods or someone else in charge could enlighten me. |
Send message Joined: 27 Sep 08 Posts: 844 Credit: 690,660,338 RAC: 105,142 |
Hi Alessio, There is an option in BOINC to priotrise re-issues, called https://boinc.berkeley.edu/trac/wiki/ProjectOptions#Acceleratingretries May you can check the settings with IT? |
©2024 CERN