Message boards : Sixtrack Application : Transfer issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 120
Credit: 1,424,992
RAC: 511
Message 33963 - Posted: 20 Jan 2018, 14:40:21 UTC - in response to Message 33926.  

I think that the IT guys have run some cleaning scripts - I will trigger them again.
Thanks for monitoring!
Cheers,
A.
ID: 33963 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 90
Credit: 1,056,395
RAC: 1,115
Message 33965 - Posted: 20 Jan 2018, 14:45:10 UTC - in response to Message 33963.  

Not sure if this is related, but I was able to upload two stuck results: one yesterday and one today.

Thank you!
ID: 33965 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 812
Credit: 34,864,689
RAC: 22,988
Message 33974 - Posted: 20 Jan 2018, 15:58:54 UTC - in response to Message 33965.  

I am still having this problem on several of my 8-core

The one I am on has 3 tasks stuck on the return page for several days and another task just finished and it was sent right up but these other 3 won't leave and one has been there so long it is now past due and the other two are due on the 25th and I have been trying to send them manually over and over every time I check each computer.
Volunteer Mad Scientist For Life
ID: 33974 · Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Jul 17
Posts: 7
Credit: 258,928
RAC: 0
Message 33977 - Posted: 20 Jan 2018, 18:01:44 UTC - in response to Message 33974.  

Hi!

I currently have 6 or 7 tasks stuck in uploading on some of my computers. At this one that I'm using right now there are the following three:

Sat 20 Jan 2018 06:43:13 PM CET | LHC@home | Started upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | [error] Error reported by file upload server: [w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Temporarily failed upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0: transient upload error
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Backing off 04:49:27 on upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0
Sat 20 Jan 2018 06:43:29 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0
Sat 20 Jan 2018 06:43:46 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0: transient upload error
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Backing off 03:09:52 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0: transient upload error
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Backing off 05:53:38 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0


I've heard something about some file fragments (on the server) and if that is the case, maybe they can be removed by a command like:

> find /correct/path/ -type f -name "*sixvf*" -size +220c -size -250c -mtime -20 -exec rm -f {} \;

(if the files on the server gets the same filenames as on my client computer, or else the -name parameter needs to be changed)
This command would clean out files that are between 220 and 250 bytes, which at least seams to be their actual sizes, judging from how much is reported to have been uploaded before it stopped (between 0.46 % and 0.51 % of files of the size of aprox. 44kB.)

Hope that you can resolve the issue before the tasks are running out of time.

Have a nice day!!!

Kindest regards,
Gunnar
ID: 33977 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 812
Credit: 34,864,689
RAC: 22,988
Message 34002 - Posted: 21 Jan 2018, 9:32:47 UTC
Last modified: 21 Jan 2018, 9:51:08 UTC

I have had 3 stuck on one of mine for days so I decided to check this pc's stats page under the Error list and there was one of them there saying it is Timed out - no response which means that finished task that was ready to send a couple days before the due date is now Invalid since the server refused to take it back (it was taking other finished tasks) so I am just going to abort this one and keep trying to send in the other 2 finished tasks when I am on this one since they still have 4 days left on that dues date.

With 50 cores running these I sure spend a lot of time checking them all and trying to make sure most of them get returned.

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10451775

Task
172348631 83568818 12 Jan 2018, 2:43:21 UTC 19 Jan 2018, 18:15:35 UTC Timed out - no response
After aborting it just now it disappeared from the ERROR list and is now gone completely from my account on ALL settings

Edit: ok now checking the next 8-core here I see I have 6 of those AVX tasks that refuse to hit the road and they are those 10 second to 1 minute versions.

One thing that would help with this is if someone at the server would come here and say that we should just abort them and send them back that way so we don't end up with an even longer list of unreturned tasks.(at least with maybe the past due ones that in reality are finished on time)

(ok time to check the rest of them)
ID: 34002 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 90
Credit: 1,056,395
RAC: 1,115
Message 34003 - Posted: 21 Jan 2018, 10:26:34 UTC - in response to Message 34002.  

One thing that would help with this is if someone at the server would come here and say that we should just abort them and send them back that way so we don't end up with an even longer list of unreturned tasks.(at least with maybe the past due ones that in reality are finished on time)


+1
ID: 34003 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 1055
Credit: 45,281,013
RAC: 147,477
Message 34099 - Posted: 26 Jan 2018, 14:24:41 UTC

Fr 26 Jan 2018 15:17:40 CET | LHC@home | Temporarily failed download of w-c3_job.B1inj_c3.2158__52__s__64.28_59.31__5.1_6.1__6__78_1_sixvf_boinc54488.zip: transient HTTP error


I see a couple of those messages in the log this afternoon.
Maybe the servers need some friendly words.
ID: 34099 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1074
Credit: 17,790,294
RAC: 35,232
Message 34111 - Posted: 27 Jan 2018, 9:18:46 UTC

Although about 770.000 "unsent" tasks are shown on the Project Status Page, I have been unsuccessfully trying to download tasks for hours now.
It always says "No tasks are available for Sixtrack" :-(

Will this problem ever be solved?
ID: 34111 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 90
Credit: 1,056,395
RAC: 1,115
Message 34112 - Posted: 27 Jan 2018, 9:44:41 UTC

I've run out of work this morning. My system has been asigned new tasks, but the download is delayed again and again.

Not worth it, shutting down for the weekend.
ID: 34112 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1074
Credit: 17,790,294
RAC: 35,232
Message 34113 - Posted: 27 Jan 2018, 9:51:12 UTC - in response to Message 34112.  

I've run out of work this morning. My system has been asigned new tasks, but the download is delayed again and again.
and again, as already with ATLAS many times, I am questioning what sense it makes to pump several hunded thousands new tasks into the mills if not even their downloads work, not to talk about the upload problems.
ID: 34113 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 127
Credit: 5,952,203
RAC: 740
Message 34114 - Posted: 27 Jan 2018, 14:12:42 UTC - in response to Message 34111.  

Although about 770.000 "unsent" tasks are shown on the Project Status Page, I have been unsuccessfully trying to download tasks for hours now.
It always says "No tasks are available for Sixtrack" :-(

Will this problem ever be solved?

OK, so it's not just me. Good to know.


All the Theory WU's use so much internet bandwidth that I can't even watch a video.

I knew there were no Sixtrack coming down because I couldn't watch Democracy Now! this morning.
ID: 34114 · Report as offensive     Reply Quote
T.J.

Send message
Joined: 17 Feb 07
Posts: 86
Credit: 968,855
RAC: 1
Message 34116 - Posted: 27 Jan 2018, 16:59:47 UTC

The uploading and downloading of WU's is still not going smooth. Sometimes it takes a few hours for only one to upload while the others go quick.
But with patience all is going well.
Greetings from,
TJ
ID: 34116 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 226
Credit: 10,369,862
RAC: 3,490
Message 34117 - Posted: 27 Jan 2018, 18:55:58 UTC
Last modified: 27 Jan 2018, 19:04:38 UTC

While there is still, clearly, some issue with the servers, would it perhaps be sensible to (briefly) suspend the issue of NEW work to allow the servers to catch up with resends? I have 5 that are Validation Pending for over a week but have yet to be sent to a second or third wingman but as these seem to go to the back of the queue, they are taking up vital space on the server for longer than necessary, rather than being send out quickly and allowed to clear. Surely allowing the resend backlog to clear would free up space and then allow a clearer run for the new work.
[I've said "clearly", "clear" and "clearer" a few more times than I had intended but hopefully I've made my point clear 8¬) ]
ID: 34117 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 812
Credit: 34,864,689
RAC: 22,988
Message 34118 - Posted: 27 Jan 2018, 19:40:56 UTC

Clearest?


It has been a bit slow but at least this time if I tell it to transfer several times and Boinc update it happens instead of staying there beyond the due date........of course I could also just be getting lucky.
ID: 34118 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 90
Credit: 1,056,395
RAC: 1,115
Message 34119 - Posted: 27 Jan 2018, 21:23:46 UTC - in response to Message 34117.  

Long validation times don't bother me personally. But it would be prudent to control job creation by staggering job creation. AFAIK over-filling the queue with large batches does not speed up production. Inf act, at work we try to achieve the opposite in accordance with Best Practice, Lean Management principles.

Can job creation be initiated automatically and staggered in smaller batches? Could job creation be rescheduled so staff is available to deal with hiccups?
ID: 34119 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 120
Credit: 1,424,992
RAC: 511
Message 34124 - Posted: 28 Jan 2018, 10:54:21 UTC - in response to Message 34119.  

Hi,
For what concerns upload/download issues, we are in the hands of the IT experts. I fear that the large result files of ATLAS tasks play a role on that...
Concerning the big delays in validation, this is due to the fact that there are a lot of queueing tasks - I am wondering about how complicated is to re-issue validation-failing tasks in higher priority.

Concerning managing the load of SixTrack tasks, I agree that having large batches does not speed up production. The only real drawbacks are:
- long times in validating results, in case of need of re-issuing;
- long times in releasing new versions of exes.

Thanks for the feedback - I will discuss this with IT.
ID: 34124 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1074
Credit: 17,790,294
RAC: 35,232
Message 34125 - Posted: 28 Jan 2018, 11:38:32 UTC - in response to Message 34117.  

While there is still, clearly, some issue with the servers, would it perhaps be sensible to (briefly) suspend the issue of NEW work to allow the servers to catch up with resends? I have 5 that are Validation Pending for over a week but have yet to be sent to a second or third wingman but as these seem to go to the back of the queue, they are taking up vital space on the server for longer than necessary, rather than being send out quickly and allowed to clear. Surely allowing the resend backlog to clear would free up space and then allow a clearer run for the new work.
This is exactly what I have been saying in several postings here in the various message boards, after the whole mess started with far too many ATLAS tasks Mid-December.
Because I simply don't understand what sense it makes to pump even more tasks into the mills while, on the other hand, neither the servers nor the network can handle these huge bulks of data.

So, I'm once more rather surprised to see that this morning the number of unsent Sixtrack tasks has passed the million (and still is growing).

But, perhaps my thinking is totally wrong and one of the mods or someone else in charge could enlighten me.
ID: 34125 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 528
Credit: 292,495,949
RAC: 901,257
Message 34128 - Posted: 28 Jan 2018, 14:34:27 UTC

Hi Alessio,

There is an option in BOINC to priotrise re-issues, called https://boinc.berkeley.edu/trac/wiki/ProjectOptions#Acceleratingretries

May you can check the settings with IT?
ID: 34128 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Sixtrack Application : Transfer issues


©2019 CERN