Message boards : ATLAS application : Native app not cleaning slots directory
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 38456 - Posted: 26 Mar 2019, 19:50:49 UTC

Hi,

We found out that the native app in some cases is not properly cleaning up its working directory after it finishes, which can lead to reaching the limit of slots and no more tasks starting. If this happens you will see an error like

Mar 26 15:43:08 dcameron05.cern.ch boinc[8067]: 26-Mar-2019 15:43:08 [LHC@home] exceeded limit of 400 slot directories
Mar 26 15:43:08 dcameron05.cern.ch boinc[8067]: 26-Mar-2019 15:43:08 [LHC@home] Can't create task for rUQMDmOkqRunyYickojUe11pABFKDmABFKDmXkDXDmABFKDmFb4s8m_1 

The limit seems to be 100 * no of cores, so you have to run a lot to reach it but some of you may have hit the limit. The easy way to fix this is to delete all the old slots directories (usually in /var/lib/boinc/slots) - if a directory contains only broken symlinks like this then it is safe to delete

/var/lib/boinc/slots/99:
total 0
lrwxrwxrwx. 1 boinc boinc 14 Feb 10 02:19 pilot.py -> pilot/pilot.py
lrwxrwxrwx. 1 boinc boinc 18 Feb 10 02:19 PILOTVERSION -> pilot/PILOTVERSION
lrwxrwxrwx. 1 boinc boinc 20 Feb 10 02:19 RunJobEvent.py -> pilot/RunJobEvent.py
lrwxrwxrwx. 1 boinc boinc 15 Feb 10 02:19 RunJob.py -> pilot/RunJob.py
lrwxrwxrwx. 1 boinc boinc 15 Feb 10 02:19 VmPeak.py -> pilot/VmPeak.py 

The problem has been fixed today for new WU so from now on this problem shouldn't happen.
ID: 38456 · Report as offensive     Reply Quote

Message boards : ATLAS application : Native app not cleaning slots directory


©2024 CERN