Message boards :
ATLAS application :
Uploads of finished tasks not possible since last night
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
While I am griping around here, I would like add/point out, that I have plenty of own work to accomplish without having to check BOINC is a hands off approach to contributing cpu time. Once the issue is fixed server side your client will catch up and resume normal operation. Spewing hyperbole at other volunteers will not help. Check back in a week. what the people at CERN should be aware of is that the longer it takes them to get the problem solved, the more finished tasks will reach their deadline, making them invalid. Remember the individual amount of work our machines contribute is pretty insignificant, statistics in the greater context. It's sobering to loose some work now and then. I'm struggling myself, because I actually wanted to purge the project routinely but couldn't for some time. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,275,098 RAC: 71,439 |
On the Project Status page I just notice that there are no new ATLAS tasks available. I definitely was a wise idea to halt distribution of new tasks until the current upload-problem gets solved. |
Send message Joined: 27 Aug 16 Posts: 8 Credit: 808,544 RAC: 0 |
I also have an upload problem, but I think it is different from the one mentioned here. I have an atlas task that is stuck at uploading 100%. Nothing helps to get the upload to finished. It doesn't say anything about that it isn't able to connect to the server as others mention here. I tried different things for example press update, restarting my computer,.... But nothing makes it finish the upload. The event log just says: 'LHC@home | Not requesting tasks: don't need (CPU: not highest priority project; Intel GPU: )". Sjmielh |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I also have an upload problem, but I think it is different from the one mentioned here. No, it is the same problem. I don't get the "can't connect to server" message either. In fact, the server is apparently not the problem, but the data base. No one knows when it will be fixed. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
Sounds like the issue we all share. You can check the BOINC logs "Messages" in Advanced to verify. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
On the Project Status page I just notice that there are no new ATLAS tasks available. This was not intentional. We have so many "running" WU that we hit an internal limit in our submission system which was there are a safety valve and we never thought we'd reach... I've increased the limit to allow more WU. As Nils mentioned in the other thread, the server performance has been tweaked to handle the increased traffic better. This probably does not mean that everything will now work prefectly but it increases your chances of upload succeeding, so please have a little more patience if things still don't work. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,275,098 RAC: 71,439 |
As Nils mentioned in the other thread, the server performance has been tweaked to handle the increased traffic better. This probably does not mean that everything will now work prefectly but it increases your chances of upload succeeding, so please have a little more patience if things still don't work. David, I am coming back to what Crystal Pellet wrote a few hours ago: So what does the last sentence mean exactly? How or who will manually free the occupied upload slots? And even more important: WHEN?It seems like new tasks are able to upload but those that finished on Tuesday night when we had the broken server are still stuck. The admins are looking into it. As I said before, many of these tasks will reach their deadlne very soon. So, time is of the essence, and if there is no solution to this problem quickly, all these tasks will be lost. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,275,098 RAC: 71,439 |
Unfortunately, the situation got worse again during last night: whereas yesterday, at least the newer tasks (i.e. the ones that were started AFTER the server crash on Tuesday) were uploaded properly, this morning I noticed that more "new" tasks which got finished during last night did NOT upload. So, at this point, the problem still exists for the "old" tasks (i.e. the ones that were started BEFORE the server crash) AND for the "new" tasks as well. In other words: it all ended up in a real mess :-((( Perhaps best would be not to make available any more tasks for download before this gross problem gets solved (the Server Status Page shows more than 22.000 tasks being processed - I wonder how they all can be loaded up in a timely manner). Otherwise, many thousands of computation hours of the volunteers will be for nothing. Even more, as the deadline for the "older" tasks is approaching rapidly, and most of them will become invalid unless a solution can be found still today. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,797,371 RAC: 18,407 |
This is a message from upload-Server: In SL69 native App. new upload-files are with status download(?). The other with retry in..... hours. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
the error message from boinc looks similar as in this thread, so maybe the problem and the solution are also similar: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4162 |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
I have the same message for a couple of tasks that are awaiting upload. There were some tasks that finally uploaded during the night. The file servers are now slightly less loaded, and we have upgraded the NFS volume to improve I/O. Still we will get these errors for a while until half-uploaded results have been cleaned. (There is a script that does this for ATLAS jobs as pointed out in the thread Gyllic refers to.) We are looking at ways to accelerate this cleaning process, but need to be careful not to remove uploaded files that have not yet been validated and assimilated. Thanks for your crunching and contributions, and please continue to be patient with this upload batch. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,275,098 RAC: 71,439 |
Thanks for your crunching and contributions, and please continue to be patient with this upload batch.Nils, two things: I am coming back once more to what Crystal Pellet wrote a few hours ago: It looks like the try of a result upload occupies a slot on the server, that's not freed when an upload fails.Has this been investigated further? The other thing: With BOINC retry intervals of 5 hours + it will be very difficult to get the waiting tasks uploaded in time before their deadline :-( |
Send message Joined: 26 Mar 16 Posts: 30 Credit: 1,258,609 RAC: 0 |
still having upload problems ... can't check back in a week ... deadlines will have passed ... running WUs under BOINC "control" is no "most advanced physics" - it is just plain IT ATLAS is messing up my rigs ... As it says: we use your spare time when your PC is idling ... I just want to contribute, not do personal research ... Excuse me for griping - just having a couple of bad days |
Send message Joined: 27 Aug 16 Posts: 8 Credit: 808,544 RAC: 0 |
I also have an upload problem, but I think it is different from the one mentioned here. Thanks for the confirmation, Jim |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,275,098 RAC: 71,439 |
No one knows when it will be fixed.that's what I am afraid of, too :-( |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Sorry. I can assure you that we're still working on it. AFAIK the problem is not the DB, but partially uploaded files that block the file_upload process. Trying to find one of them manually takes ages with the current load. :-( |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Why do you delete the partial uploads manually ? I remember that David Cameron did a script to fix this issue in the past. Its script has just to be adapted to the new upload server... |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,911,354 RAC: 128,284 |
... partially uploaded files that block the file_upload process. Trying to find one of them manually takes ages ... Do you need some input from our side? Task IDs or anything else? |
Send message Joined: 27 Sep 08 Posts: 831 Credit: 688,448,880 RAC: 143,314 |
I aborted the 15-20 in my queues, they are work well now, I assume the work will be re-created if needed or someoneelses work unit will re-validate |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
The cleanup script is stuck due to load, so we will temporary stop the upload servers for a while to clear this backlog and half-uploaded entries. Thus you will see a different message when your BOINC clients try to upload. Please simply let them back off, later we'll enable the file servers again. Thanks for your patience. No task id's etc should be needed, thanks we can get them from the DB and we have plenty of samples from our own BOINC clients. |
©2024 CERN