1) Message boards : News : File upload issues (Message 34025)
Posted 22 Jan 2018 by Gunnar Hjern
Post:
It seems that the servers are down for the moment.
I got the following in my log:

Mon 22 Jan 2018 11:09:33 AM CET | LHC@home | Started upload of w-c9_job.B1inj_c9.2158__26__s__64.28_59.31__16.1_17.1__6__60_1_sixvf_boinc27416_0_r905081043_0
Mon 22 Jan 2018 11:09:33 AM CET | LHC@home | Started upload of w-c1_job.B1inj_c1.2158__42__s__64.28_59.31__13.1_14.1__6__82.5_1_sixvf_boinc44247_0_r387987546_0
Mon 22 Jan 2018 11:11:34 AM CET | | Project communication failed: attempting access to reference site
Mon 22 Jan 2018 11:11:34 AM CET | LHC@home | Temporarily failed upload of w-c9_job.B1inj_c9.2158__26__s__64.28_59.31__16.1_17.1__6__60_1_sixvf_boinc27416_0_r905081043_0: transient HTTP error
Mon 22 Jan 2018 11:11:34 AM CET | LHC@home | Backing off 03:27:00 on upload of w-c9_job.B1inj_c9.2158__26__s__64.28_59.31__16.1_17.1__6__60_1_sixvf_boinc27416_0_r905081043_0
Mon 22 Jan 2018 11:11:36 AM CET | | Internet access OK - project servers may be temporarily down.
Mon 22 Jan 2018 11:11:50 AM CET | | Project communication failed: attempting access to reference site
Mon 22 Jan 2018 11:11:50 AM CET | LHC@home | Temporarily failed upload of w-c1_job.B1inj_c1.2158__42__s__64.28_59.31__13.1_14.1__6__82.5_1_sixvf_boinc44247_0_r387987546_0: transient HTTP error
Mon 22 Jan 2018 11:11:50 AM CET | LHC@home | Backing off 03:47:25 on upload of w-c1_job.B1inj_c1.2158__42__s__64.28_59.31__13.1_14.1__6__82.5_1_sixvf_boinc44247_0_r387987546_0
Mon 22 Jan 2018 11:11:52 AM CET | | Internet access OK - project servers may be temporarily down.
Mon 22 Jan 2018 11:30:47 AM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__2.2_2.3__5__55.5_1_sixvf_boinc78035_1_r2013299009_0
Mon 22 Jan 2018 11:30:47 AM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__5.7_5.8__5__42_1_sixvf_boinc80091_0_r419032409_0
Mon 22 Jan 2018 11:32:47 AM CET | | Project communication failed: attempting access to reference site
Mon 22 Jan 2018 11:32:47 AM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__2.2_2.3__5__55.5_1_sixvf_boinc78035_1_r2013299009_0: transient HTTP error
Mon 22 Jan 2018 11:32:47 AM CET | LHC@home | Backing off 05:40:47 on upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__2.2_2.3__5__55.5_1_sixvf_boinc78035_1_r2013299009_0
Mon 22 Jan 2018 11:32:47 AM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__5.7_5.8__5__42_1_sixvf_boinc80091_0_r419032409_0: transient HTTP error
Mon 22 Jan 2018 11:32:47 AM CET | LHC@home | Backing off 03:51:20 on upload of LHC_2015_LHC_2015_234_BOINC_errors__23__s__62.31_60.32__5.7_5.8__5__42_1_sixvf_boinc80091_0_r419032409_0
Mon 22 Jan 2018 11:32:49 AM CET | | Internet access OK - project servers may be temporarily down.

//Gunnar
2) Message boards : Sixtrack Application : Peak of "Validation inconclusive" tasks (Message 34012)
Posted 21 Jan 2018 by Gunnar Hjern
Post:
Hi!

Here are my invalids (grown to 20 the last hour):

https://lhcathome.cern.ch/lhcathome/result.php?resultid=173348129
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173348131
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173348504
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173348539
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173345228
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173288516
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173286246
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173272250
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173259482
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173256426
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173240166
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173223712
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173213720
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173203471
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173203481
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173199914
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173178095
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173166842
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173145455
https://lhcathome.cern.ch/lhcathome/result.php?resultid=173115913

I think my computers should be visible, otherwise pls notify me.
Hope you can get something out of it! :-)

Have a nice day!!!

//Gunnar
3) Message boards : Sixtrack Application : Peak of "Validation inconclusive" tasks (Message 34010)
Posted 21 Jan 2018 by Gunnar Hjern
Post:
Yes, I sure have seen that too!

Until yesterday I've crunched well over 1000 tasks, and this without one single "invalid" or "Validation inconclusive"! :-)
And now, when I woke up this morning, I found that I've reached 17 Invalids, and have 6 more tasks in "Validation inconclusive"!!! :-(

These faults seems to be rather evenly spread between nearly all of my 14 computers, so this is NOT one single machine that got weird.
I've also noticed a new peak in problem with uploading tasks - sigh.

Maybe the databases need a thorough clean-up, and the servers a good old fashioned reboot! ;-)

//Gunnar
4) Message boards : Sixtrack Application : Transfer issues (Message 33977)
Posted 20 Jan 2018 by Gunnar Hjern
Post:
Hi!

I currently have 6 or 7 tasks stuck in uploading on some of my computers. At this one that I'm using right now there are the following three:

Sat 20 Jan 2018 06:43:13 PM CET | LHC@home | Started upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | [error] Error reported by file upload server: [w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Temporarily failed upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0: transient upload error
Sat 20 Jan 2018 06:43:28 PM CET | LHC@home | Backing off 04:49:27 on upload of w-c1_job.B1inj_c1.2158__50__s__64.28_59.31__17.1_18.1__6__10.5_1_sixvf_boinc52931_0_r399750021_0
Sat 20 Jan 2018 06:43:29 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0
Sat 20 Jan 2018 06:43:46 PM CET | LHC@home | Started upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0: transient upload error
Sat 20 Jan 2018 06:43:53 PM CET | LHC@home | Backing off 03:09:52 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__3_1_sixvf_boinc50211_0_r725332866_0
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | [error] Error reported by file upload server: [LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0] locked by file_upload_handler PID=-1
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Temporarily failed upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0: transient upload error
Sat 20 Jan 2018 06:44:04 PM CET | LHC@home | Backing off 05:53:38 on upload of LHC_2015_LHC_2015_234_BOINC_errors__15__s__62.31_60.32__3.1_3.2__5__15_1_sixvf_boinc50219_0_r867841013_0


I've heard something about some file fragments (on the server) and if that is the case, maybe they can be removed by a command like:

> find /correct/path/ -type f -name "*sixvf*" -size +220c -size -250c -mtime -20 -exec rm -f {} \;

(if the files on the server gets the same filenames as on my client computer, or else the -name parameter needs to be changed)
This command would clean out files that are between 220 and 250 bytes, which at least seams to be their actual sizes, judging from how much is reported to have been uploaded before it stopped (between 0.46 % and 0.51 % of files of the size of aprox. 44kB.)

Hope that you can resolve the issue before the tasks are running out of time.

Have a nice day!!!

Kindest regards,
Gunnar
5) Message boards : News : File upload issues (Message 33975)
Posted 20 Jan 2018 by Gunnar Hjern
Post:
Hi!

A lot of upload-stuck tasks are soon hitting deadline and many hours of computer work will be waisted! :-(

Would it be possible for some sys-admin to manually erase those faulty file fragments on the server?
For example with some command like:
> find /correct/path/ -size +220c -size -250c -mtime -20 -exec rm -f {} \;

(Afaik they are about 220 to 250 bytes, and they should be younger than 20 days.
If some common substring "sub" of the file names are known, you can of course add -name "*sub*" to the params for find.)

Not only would it save the work done by us clients, but I think it would lessen the workload of the servers too, as far less client computers will then frequently retry to upload the stuck files.

Have a nice day!!!

Kindest regards,
Gunnar Hjern
6) Message boards : News : File upload issues (Message 33902)
Posted 17 Jan 2018 by Gunnar Hjern
Post:
Thanks for your explanations!
Have a nice day!!
/Gunnar
7) Message boards : News : File upload issues (Message 33895)
Posted 17 Jan 2018 by Gunnar Hjern
Post:
Hi!

I'm not sure if I have understood this issue correctly:

Those "partly uploaded file", are they on my machine or on the server?

Do I need to take any actions, or is the problem going to solve itself when the servers are less busy?

I currently have half a dozen or so tasks that are stuck in uploading state, and they represent
together several days of hard computing so I'd hate to have to abort them! :-(

As I can see on the server stat page, there are several thousands of items in the tasks and WU's
"waiting for deletion" queues, and a whopping 768973 tasks to send!! :-O

Will this issue be solved by itself once they are crunched and validated?
(hopefully before the deadlines expires)

Kindest regards,
Gunnar



©2019 CERN