Message boards :
ATLAS application :
New specific file upload error
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 298 Credit: 3,135,072 RAC: 1,631 ![]() ![]() ![]() |
1/2/2018 1:23:14 AM | LHC@home | [error] Error reported by file upload server: [rf0MDm29VprnSu7Ccp2YYBZmABFKDmABFKDm6aFKDmABFKDm1AEbGn_0_r1764236154_ATLAS_result] locked by file_upload_handler PID=-1 What's this all about? |
![]() ![]() Send message Joined: 30 May 08 Posts: 93 Credit: 5,160,246 RAC: 0 ![]() ![]() |
What's this all about? There are several mentions of this kind of error in the Uploads of finished tasks not possible since last night thread. Of particular interest might be Message 33420. |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,701,075 RAC: 42,854 ![]() ![]() ![]() |
Of particular interest might be Message 33420.well, repeating part of the above mentioned message 33420 from David Cameron: - the upload server is overloaded, so many uploads fail leaving a half-complete file - the retries fail because the half-complete file is still there (the "locked by file_upload_handler PID=-1" error) - our cleaning of incomplete files runs only once per day so there is no possibliity of retries succeeding until one day has passed - the server getting full last night was yet another problem but this is now fixed I have changed the cleaning to run once every 6 hours and delete files older than 6 hours to make it more aggressive. But if you have a failed upload you'll still have to wait some time before it will work, so clicking retry every few minutes won't help. As much as I understood all these problems when there were almost 25.000 tasks in the mills at that time, I am wondering why these upload problems still exist (I also have got several new ones last night) now, when there are not more than roughly 10.000 tasks being crunched. So I guess that some other problem must be involved. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 ![]() ![]() |
As much as I understood all these problems when there were almost 25.000 tasks in the mills at that time, I am wondering why these upload problems still exist (I also have got several new ones last night) now, when there are not more than roughly 10.000 tasks being crunched. Because the bottleneck has not been removed. Sixtrack has lots of work queued atm and the projects share the same file server afaik. The official statement was that the issue will be fixed in mid January. |
![]() Send message Joined: 28 Sep 04 Posts: 604 Credit: 36,907,043 RAC: 16,367 ![]() ![]() ![]() |
As much as I understood all these problems when there were almost 25.000 tasks in the mills at that time, I am wondering why these upload problems still exist (I also have got several new ones last night) now, when there are not more than roughly 10.000 tasks being crunched. And nothing has been done to ease up the fileserver load from sixtrack tasks. The Ready To Send queue is once again increasing reaching now about 1.3 million tasks. When a newly created task is added to the queue (like a resend because a task was not returned by deadline) takes about 11 days to crawl thru the queue before reaching a new host. See one here: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82154077 ![]() |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,701,075 RAC: 42,854 ![]() ![]() ![]() |
And nothing has been done to ease up the fileserver load from sixtrack tasks. The Ready To Send queue is once again increasing reaching now about 1.3 million tasks. When a newly created task is added to the queue (like a resend because a task was not returned by deadline) takes about 11 days to crawl thru the queue before reaching a new host. See one here: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=82154077this really doesn't seem to make a whole lot of sense :-( |
Send message Joined: 28 Dec 08 Posts: 298 Credit: 3,135,072 RAC: 1,631 ![]() ![]() ![]() |
New work generation has nothing to do with file upload. Though that is interesting. But also leads to the question why six track was unthrottled and allowed to mess with uploads of other tasks. That I am getting PID errors well after the problem was discovered says to me the strategy employed is getting overpowered yet again. Time to get some serious hardware or larger drive or whatever to handle the increased demand. I guess the error does not matter as far as results go, since it appears the task is not lost when i disconnect the local client for the night and shut down my system. |
![]() Send message Joined: 23 Apr 10 Posts: 5 Credit: 1,349,240 RAC: 0 ![]() ![]() |
Same problem here. Will stop ATLAS again. Wanted to run it as main 2018 project, but will wait until issues are "really" fixed. |
Send message Joined: 18 Dec 15 Posts: 1558 Credit: 57,701,075 RAC: 42,854 ![]() ![]() ![]() |
the "locked by file_upload_handler PID=55833" problem seems to be back :-( I've had it several times during the past days. |
©2023 CERN