Message boards : ATLAS application : Panda failures
Message board moderation

To post messages, you must log in.

AuthorMessage
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36766 - Posted: 19 Sep 2018, 4:22:59 UTC

I created a Python script that scrapes the PandaIDs from my completed and verified ATLAS results, assembles the PandIDs into a URL like the one below and then opens a tab in my default webbrowser with that URL.
For many weeks 99% of my ATLAS results have been passing the Panda test 99%. Suddenly both of my ATLAS crunchers are getting a lot of Panda errors.
I ran the script today against this host and discovered that 11 of 25 results failed Panda's verification with errors like:

    jobdispatcher, 100: lost heartbeat
    pilot, 1008: Failed in data staging: Failed to prepare destination



The Panda URL is https://bigpanda.cern.ch/jobs?pandaid=4055235277,4055252522,4055210662,4055226810,4056866884,4056866856,4056866894,4056866903,4056866901,4056866920,4054776638,4054805421,4054853353,4054945351,4054992272,4058034278,4058034293,4058034294,4058034235,4058012059,4053382532,4056867071,4056803374,4056803130,4056765095,4056764983,4054535456

Here are links to 5 of the 11 results that Panda failed. The stderr output indicates they created a HITS file.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206752984
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206756018
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206689482
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206740679
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206689663

I also ran the script against this host and it produced the following URL which shows Panda failed 4 of 13 results:
https://bigpanda.cern.ch/jobs?pandaid=4061761435,4062075077,4061684503,4061590985,4060852970,4061211847,4061211846,4060874678,4054853442,4054945304,4057515283,4058034221,4057908464

ID: 36766 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 36778 - Posted: 19 Sep 2018, 12:01:26 UTC - in response to Message 36766.  

We had some infrastructure problems on the ATLAS side, over last weekend and also yesterday evening. This led to the panda errors that you see. Hopefully things should be back to normal now.
ID: 36778 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37004 - Posted: 11 Oct 2018, 1:21:57 UTC

Got another bunch of these "jobdispatcher, 100: lost heartbeat" errors.
ID: 37004 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37054 - Posted: 17 Oct 2018, 0:20:52 UTC - in response to Message 36778.  

ID: 37054 · Report as offensive     Reply Quote

Message boards : ATLAS application : Panda failures


©2024 CERN