Message boards : ATLAS application : several Atlas tasks had to be aborted
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37022 - Posted: 14 Oct 2018, 18:14:31 UTC
Last modified: 14 Oct 2018, 18:15:18 UTC

ID: 37022 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37032 - Posted: 15 Oct 2018, 12:03:41 UTC

one more: https://lhcathome.cern.ch/lhcathome/result.php?resultid=207617875

I could find these errors:




Supporting BOINC, a great concept !
ID: 37032 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,041,230
RAC: 136,850
Message 37033 - Posted: 15 Oct 2018, 13:12:59 UTC - in response to Message 37032.  

Line 2 points out a possible "hardware error".
As ATLAS runs on virtual hardware it may indicate a corrupt vdi file.

You may reset the project to get a fresh vdi file.
If the error persists you'd have to dig deeper, e.g. for a corrupt VirtualBox installation or a real hardware error.
ID: 37033 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37034 - Posted: 15 Oct 2018, 13:36:17 UTC

Nope, 98% of the WUs do fine on my machines, so I don't think it is a problem of the different boxes.

I tend to faulty WUs or missing download servers for the spin-up of each WU


Supporting BOINC, a great concept !
ID: 37034 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,192,791
RAC: 103,819
Message 37037 - Posted: 15 Oct 2018, 16:36:03 UTC

Yeti,
you have 32 GByte for this Ryzen with 16 Cores.
Do you let 4 or five Atlas running at the same time?
Boinc 7.12.1 is now the default.
Seam so, that one Atlas is running sometime in a RAM-Desaster and find no end.
You have more than 170 Tasks finished successful for the moment. Not easy to see what is going wrong.
ID: 37037 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37038 - Posted: 15 Oct 2018, 17:08:56 UTC - in response to Message 37037.  

Do you let 4 or five Atlas running at the same time?
Nope, only 2x4 or 3x4

Boinc 7.12.1 is now the default.
But this doesn't mean that my version is bad. And newer BOINC-Versions have some very bad restrictions (or bugs, don't know)

Seam so, that one Atlas is running sometime in a RAM-Desaster and find no end.
You have more than 170 Tasks finished successful for the moment. Not easy to see what is going wrong.
As these tasks that I have to abort are spread over several of my machines and have started to appear some days ago I don't think it is a problem of my boxes


Supporting BOINC, a great concept !
ID: 37038 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37039 - Posted: 15 Oct 2018, 17:14:46 UTC - in response to Message 37022.  

Hi,

I found several Atlas-Tasks that were doing nothing usefull and I had to abort them.

Examples:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=207556366

https://lhcathome.cern.ch/lhcathome/result.php?resultid=207610403

https://lhcathome.cern.ch/lhcathome/result.php?resultid=207570427

https://lhcathome.cern.ch/lhcathome/result.php?resultid=207559971


To the "faulty tasks" hypothesis:
The first example failed on 2 other hosts but neither of those 2 shows successful results only failures. The most recent iteration is still in progress -> inconclusive
The second example validated on the next iteration -> good task
The third example is in second iteration which is in progress -> inconclusive
The fourth example validated on the third iteration -> good task

2 good task indicators versus 2 inconclusive indicators. I bet in time the 2 still in progress validate too.

If the tasks are indeed faulty then why are they failing only on your hosts?

Also a couple of the examples ran for more than 48 hours. Do ATLAS tasks not have a time limit?
ID: 37039 · Report as offensive     Reply Quote

Message boards : ATLAS application : several Atlas tasks had to be aborted


©2024 CERN