Message boards :
ATLAS application :
Potentially failing vbox tasks today
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi all, A change last night in an upstream component that we use for ATLAS tasks means that vbox tasks send between then and now are likely to fail. The tasks will succeed in producing the HITS result file but will fail to copy it to the shared directory for upload by boinc client. You may not notice the failures since the tasks will be validated. I have introduced a fix for this in the ATLAS bootstrap script so that tasks starting from around now should work properly. The native tasks are unaffected by this problem. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,519,272 RAC: 15,571 |
So far I have 7 failed Atlas tasks today that match your description. |
Send message Joined: 20 Nov 19 Posts: 21 Credit: 1,074,330 RAC: 0 |
I have 6 Atlas (4 CPU) task running on two systems right now. In all 6 athena.py stopped running simultaneously. My systems went from a load average of 12+ to just 1+ for several minutes. I stopped boinc-client, autofs, and squid. I brought up squid, autofs, and boinc-client and saw no change. According to boinc-manager all tasks were running, but they had not written check points. After a few minutes of watching athena.py trying to restart itself several times everything returned to normal. I don't know if these tasks will fail or turn out to be invalid or not. I think The Grinch has invaded CERN. EDIT: I should add that these are NOT vbox tasks, these are Altas Native. |
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 211 |
I have 6 Atlas (4 CPU) task running on two systems right now. In all 6 athena.py stopped running simultaneously. My systems went from a load average of 12+ to just 1+ for several minutes. I stopped boinc-client, autofs, and squid. I brought up squid, autofs, and boinc-client and saw no change. According to boinc-manager all tasks were running, but they had not written check points. After a few minutes of watching athena.py trying to restart itself several times everything returned to normal. I don't know if these tasks will fail or turn out to be invalid or not.How close together is "simultaneously"? It sounds like the normal behaviour as the task reaches phase 4 prior to completion. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,519,272 RAC: 15,571 |
I have leftover directories xxxxx_ATLAS_hits from the 13th of December in projects\lhcathome.cern.ch_lhcathome folder. I think that these are left from the failed tasks this thread is talking about. They contain the HITS.xxxx files among other things. Can I delete them? |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
I have leftover directories xxxxx_ATLAS_hits from the 13th of December in projects\lhcathome.cern.ch_lhcathome folder. I think that these are left from the failed tasks this thread is talking about. They contain the HITS.xxxx files among other things. Can I delete them? Yes, you can delete them. Even though you have valid results from those tasks it is not possible to use them. |
©2024 CERN