Message boards : Sixtrack Application : Transfer issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33508 - Posted: 26 Dec 2017, 9:39:30 UTC

Two Sixtrack tasks for me. Deadline is the 27th according to BOINC or 28th according to LHC. This time I'll wait to see when it will actually happen.
ID: 33508 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 732
Credit: 49,336,437
RAC: 25,953
Message 33509 - Posted: 26 Dec 2017, 10:53:11 UTC

I have one sixtrack that is about to expire in 6 hours according to Boinc. Task is here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=169614100
It has been uploading for 5 days now. Intresting to see what will happen.
ID: 33509 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33511 - Posted: 26 Dec 2017, 11:49:37 UTC - in response to Message 33509.  

I noticed you have several tasks waiting for validation just like I do, too.

What happens if results cannot be validated, because the second volunteer hasn't been able to deliver on time i.e. the necessary quorum is not met?

Many tasks have been aborted, failed or cancelled, but with more than a million tasks in queue, those retries probably won't be completed in time. Even more file fragments clogging up the server?
ID: 33511 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,923,727
RAC: 31,866
Message 33512 - Posted: 26 Dec 2017, 12:02:58 UTC - in response to Message 33511.  

...Many tasks have been aborted, failed or cancelled, but with more than a million tasks in queue, those retries probably won't be completed in time. Even more file fragments clogging up the server?

I am afraid your assumption is correct :-(
ID: 33512 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 732
Credit: 49,336,437
RAC: 25,953
Message 33515 - Posted: 26 Dec 2017, 12:55:01 UTC - in response to Message 33511.  

I noticed you have several tasks waiting for validation just like I do, too.

What happens if results cannot be validated, because the second volunteer hasn't been able to deliver on time i.e. the necessary quorum is not met?

I think that the maximum number a task is sent out is 5 copies. If a task goes beyond its deadline next copy will be sent out. But as these resents will go to the end of the ready to send queue, it will take some time before they actually arrive to a host for crunching. If all 5 copies are out in the field and have gone over their deadline I think that the WU is pronounced as an error (I'm not sure if this is how it works, correct me if I'm wrong).

Anyway with RTS queue being so long at the moment there will be a lot of time to upload your results before final failure. That is if the servers can hold up the fortress with the constant pounding. The RTS queue shows some signs of slowly draining although the number of in progress tasks is also dropping. So let's keep the holiday spirit up and not sink in despair. ;-)
ID: 33515 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,923,727
RAC: 31,866
Message 33516 - Posted: 26 Dec 2017, 13:08:34 UTC - in response to Message 33515.  

... So let's keep the holiday spirit up and not sink in despair. ;-)
Harri - I think, that's all we can do anyway :-)
ID: 33516 · Report as offensive     Reply Quote
BetelgeuseFive

Send message
Joined: 22 Sep 13
Posts: 11
Credit: 660,161
RAC: 0
Message 33517 - Posted: 26 Dec 2017, 14:06:56 UTC

Unable to upload at the moment; I get an error message I have not seen before:

26/12/2017 15:00:35 | LHC@home | [error] Error reported by file upload server: [w5_hllhc10_sqz1500_Qcol_chr20_w5__6__s__62.31_60.32__18_20__5__22.5_1_sixvf_boinc806_1_r2118365745_0] locked by file_upload_handler PID=-1

Tom
ID: 33517 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,923,727
RAC: 31,866
Message 33519 - Posted: 26 Dec 2017, 14:38:16 UTC - in response to Message 33517.  

Unable to upload at the moment; I get an error message I have not seen before:

26/12/2017 15:00:35 | LHC@home | [error] Error reported by file upload server: [w5_hllhc10_sqz1500_Qcol_chr20_w5__6__s__62.31_60.32__18_20__5__22.5_1_sixvf_boinc806_1_r2118365745_0] locked by file_upload_handler PID=-1
this is exactly the kind of error message which we got for ATLAS 2 weeks ago, when the connections and servers were totally overloaded.
Obviously, when uploads get stuck and only fragments of them arrive at the server, retries of the same upload won't be successful until some cleaning tool is being run there (no idea whether this tool is in operation over the holidays or not - as far as I remember, it had to be initiated manually).

Same thing seems to be true now for Sixtrack, probably due to a too high a number of tasks in the mills (as seen from the Project Status Page).
ID: 33519 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33528 - Posted: 27 Dec 2017, 12:48:48 UTC

Error reported by file upload server: Server is out of disk space

Uh-oh.
ID: 33528 · Report as offensive     Reply Quote
nairb

Send message
Joined: 1 May 07
Posts: 27
Credit: 2,339,393
RAC: 140
Message 33533 - Posted: 27 Dec 2017, 16:16:07 UTC
Last modified: 27 Dec 2017, 16:43:45 UTC

Yup, same thing here...

27/12/2017 16:10:33 | LHC@home | Started upload of BT2KDmof9nrnDDn7oo6G73TpABFKDmABFKDmRLFKDmABFKDmtodCCn_0_r1075799161_ATLAS_result

27/12/2017 16:10:35 | LHC@home | [error] Error reported by file upload server: Server is out of disk space

Oops... its an ATLAS issue. Not Sixtrack.... Ignore
ID: 33533 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 63
Credit: 14,576,403
RAC: 1,896
Message 33540 - Posted: 28 Dec 2017, 3:08:38 UTC

Still have one locked :( Others have completed and sent.

LHC@home 12/27/2017 10:07:38 PM [error] Error reported by file upload server: [LHC_2015_LHC_2015_260_BOINC_errors__19__s__62.31_60.32__5.6_5.7__5__39_1_sixvf_boinc65870_0_r1527001410_0] locked by file_upload_handler PID=-1
ID: 33540 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,923,727
RAC: 31,866
Message 33541 - Posted: 28 Dec 2017, 8:47:52 UTC

my two remaining ones (which got finished about 2 days ago) finally got uploaded an hour ago :-)
ID: 33541 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33542 - Posted: 28 Dec 2017, 10:15:30 UTC - in response to Message 33541.  

I have lost my first Sixtrack result of 180 GFLOPs. The next task expires this afternoon. In addition, two more tasks are stuck since last night.

Wouldn't it be wiser to cancel now and open the blocked slots for other volunteers? Chances of recovery are pretty slim as far as I can tell and my account -- not the project -- is credited with the failure. What would be the harm if I cancelled now?
ID: 33542 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 732
Credit: 49,336,437
RAC: 25,953
Message 33547 - Posted: 28 Dec 2017, 11:44:01 UTC - in response to Message 33542.  

I think that you still have a chance of getting your credit for expired tasks if they have not yet been granted any credit to anybody. The first two who will return valid results will get the credit.

I have one task that the second host that it was sent to aborted it a couple of hours after it was issued. That was 9 days ago but the third copy which was created soon after the abortion has not yet reached a new host. It is probably still in the Ready To Send queue waiting to be downloaded (as is the fourth copy which was created yesterday after my copy expired too). My copy has been stuck in transfer tab for 164 hours now.
ID: 33547 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 33596 - Posted: 31 Dec 2017, 11:59:23 UTC - in response to Message 33547.  

I think that you still have a chance of getting your credit for expired tasks if they have not yet been granted any credit to anybody. The first two who will return valid results will get the credit.


This seems to be accurate. Although the project shows the WUs in question as Errors, the results have not been cancelled. Although it has been days, the results are still being uploaded -- and continue to fail. So far only one result has been returned per WU.
ID: 33596 · Report as offensive     Reply Quote
T.J.

Send message
Joined: 17 Feb 07
Posts: 86
Credit: 968,855
RAC: 0
Message 33643 - Posted: 4 Jan 2018, 7:40:03 UTC

Happy New Year.

I would think all the staff have been back from the holiday's and start looking at the servers.
Maybe they have done, maybe they are still on holiday but I have still two WU's that will not upload, one is already 14 days trying...
It is also reported to my results list as an error, which is of course as the return deadline already passed.
So the issue "not uploading" is still ongoing.

Thanks.
Greetings from,
TJ
ID: 33643 · Report as offensive     Reply Quote
JugNut

Send message
Joined: 26 Dec 17
Posts: 2
Credit: 1,205,590
RAC: 0
Message 33664 - Posted: 5 Jan 2018, 3:41:49 UTC - in response to Message 33643.  
Last modified: 5 Jan 2018, 3:44:07 UTC

Yep same here, yesterday I had three of my WU's pass deadline and now have the same status as yours.(Timed out - no response) I also have another six that will presumably end in the same manner, so far I haven't seen any WU's upload successfully after getting the dreaded "locked by file_upload_handler PID=-1" message.

What a waste.

If the partial cached upload are the fly in the ointment? Then, does anyone know how to hack the client state so that the WU is seen as new upload again? I did try a couple of things myself but my edits didn't work as planned and were ignored.:(
ID: 33664 · Report as offensive     Reply Quote
superempie

Send message
Joined: 28 Jul 05
Posts: 24
Credit: 6,603,623
RAC: 0
Message 33675 - Posted: 5 Jan 2018, 18:56:46 UTC

Same here. One that doesn't want to upload (Windows) and is already stuck a couple of days. Other Sixtrack workunits went fine before and after this one completed.
I am now also seeing the message "05-Jan-2018 19:37:30 [LHC@home] Error reported by file upload server: [workspace1_hl13_collision_scan_62.3275_60.3000_chrom_15_oct_-300_B4__46__s__62.31_60.32__6_8__5__60_1_sixvf_boinc4001_1_r1970795041_0] locked by file_upload_handler PID=-1" on Linux too.
ID: 33675 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,923,727
RAC: 31,866
Message 33677 - Posted: 6 Jan 2018, 5:48:07 UTC

What is likewise frustrating: there are 1.027.147 "unsent" Sixtrack tasks waiting for download, but the download doesn't work :-(

I guess best for us crunchers would be to change to other projects meanwhile, until LHC gets their infrastructure straightened out sometime this year, as promised last month.
ID: 33677 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 33681 - Posted: 6 Jan 2018, 9:36:08 UTC - in response to Message 33677.  

Yes it is frustrating - I am experiencing similar issues - see https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4539&postid=33680

But it works in fits and starts.
ID: 33681 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Sixtrack Application : Transfer issues


©2024 CERN