Message boards : ATLAS application : ATLAS WU failed
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2400
Credit: 225,089,178
RAC: 123,239
Message 31947 - Posted: 14 Aug 2017, 13:47:28 UTC

I recently got a WU from the 11855278 batch that failed after 5.5 min:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=153614343

It's not only annoying that the initial download size raised to 184 MB (!) (do you really care about volunteers with lower bandwidth?) but also that my firewall dropped a connection to pandaserver.cern.ch port 25443 which is not mentioned in the lhc@home FAQ as necessary server/port.

It would be nice if the project team could quickly check if there is a misconfigured batch and keep the downloads far lower than now.
It would also be nice to get some response here. It helps to decide whether to set NNT for a while.
ID: 31947 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1687
Credit: 102,630,930
RAC: 121,064
Message 31952 - Posted: 15 Aug 2017, 6:05:16 UTC

What I observed was that yesterday several WUs yesterday were aborted by the server.

One example here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=153613677

No idea why this happened. Maybe they found out that there was a bunch of faulty WUs ?
ID: 31952 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 31953 - Posted: 15 Aug 2017, 10:28:11 UTC

Indeed this was a misconfigured batch of tasks. We got 7 new batches and one was not configured correctly. All the WU were aborted yesterday, sorry for the inconvenience.

The connection to pandaserver.cern.ch was due to the misconfiguration.

As mentioned previously, we cannot change the size of the download, but we can increase the number of events processed per WU so that at least there are fewer downloads.
ID: 31953 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2400
Credit: 225,089,178
RAC: 123,239
Message 31954 - Posted: 15 Aug 2017, 10:40:55 UTC - in response to Message 31953.  

Indeed this was a misconfigured batch of tasks. We got 7 new batches and one was not configured correctly. All the WU were aborted yesterday, sorry for the inconvenience.

The connection to pandaserver.cern.ch was due to the misconfiguration.

As mentioned previously, we cannot change the size of the download, but we can increase the number of events processed per WU so that at least there are fewer downloads.

Thanks David.

Recent WUs seem to run better.
ID: 31954 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1269
Credit: 8,472,878
RAC: 2,115
Message 31968 - Posted: 16 Aug 2017, 14:51:43 UTC - in response to Message 31953.  

David Cameron wrote:
As mentioned previously, we cannot change the size of the download, but we can increase the number of events processed per WU so that at least there are fewer downloads.

I prefer the current number of events and predictable runtimes.
Longer run times and today's announced shortened deadlines is not a good idea.
Btw: The upload files will grow proportion-able when the number of events is increased.
ID: 31968 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 31970 - Posted: 16 Aug 2017, 15:24:48 UTC

I doubt that this is really an opinion poll, but one week works for me. LHCb takes longer to run than ATLAS, and has a one-week deadline. I have no problems with it.
I keep the default 0.10 + 0.50 buffer however; I don't like big buffers anyway.
ID: 31970 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS WU failed


©2024 CERN