Message boards :
ATLAS application :
ATLAS WU failed
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,420,658 RAC: 132,000 ![]() ![]() ![]() |
I recently got a WU from the 11855278 batch that failed after 5.5 min: https://lhcathome.cern.ch/lhcathome/result.php?resultid=153614343 It's not only annoying that the initial download size raised to 184 MB (!) (do you really care about volunteers with lower bandwidth?) but also that my firewall dropped a connection to pandaserver.cern.ch port 25443 which is not mentioned in the lhc@home FAQ as necessary server/port. It would be nice if the project team could quickly check if there is a misconfigured batch and keep the downloads far lower than now. It would also be nice to get some response here. It helps to decide whether to set NNT for a while. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,482,653 RAC: 172,458 ![]() ![]() ![]() |
What I observed was that yesterday several WUs yesterday were aborted by the server. One example here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=153613677 No idea why this happened. Maybe they found out that there was a bunch of faulty WUs ? |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 8 ![]() ![]() |
Indeed this was a misconfigured batch of tasks. We got 7 new batches and one was not configured correctly. All the WU were aborted yesterday, sorry for the inconvenience. The connection to pandaserver.cern.ch was due to the misconfiguration. As mentioned previously, we cannot change the size of the download, but we can increase the number of events processed per WU so that at least there are fewer downloads. |
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,420,658 RAC: 132,000 ![]() ![]() ![]() |
Indeed this was a misconfigured batch of tasks. We got 7 new batches and one was not configured correctly. All the WU were aborted yesterday, sorry for the inconvenience. Thanks David. Recent WUs seem to run better. |
Send message Joined: 14 Jan 10 Posts: 1176 Credit: 7,446,407 RAC: 14,583 ![]() ![]() ![]() |
David Cameron wrote: As mentioned previously, we cannot change the size of the download, but we can increase the number of events processed per WU so that at least there are fewer downloads. I prefer the current number of events and predictable runtimes. Longer run times and today's announced shortened deadlines is not a good idea. Btw: The upload files will grow proportion-able when the number of events is increased. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
I doubt that this is really an opinion poll, but one week works for me. LHCb takes longer to run than ATLAS, and has a one-week deadline. I have no problems with it. I keep the default 0.10 + 0.50 buffer however; I don't like big buffers anyway. |
©2023 CERN