Message boards :
ATLAS application :
Native app not cleaning slots directory
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi, We found out that the native app in some cases is not properly cleaning up its working directory after it finishes, which can lead to reaching the limit of slots and no more tasks starting. If this happens you will see an error like Mar 26 15:43:08 dcameron05.cern.ch boinc[8067]: 26-Mar-2019 15:43:08 [LHC@home] exceeded limit of 400 slot directories Mar 26 15:43:08 dcameron05.cern.ch boinc[8067]: 26-Mar-2019 15:43:08 [LHC@home] Can't create task for rUQMDmOkqRunyYickojUe11pABFKDmABFKDmXkDXDmABFKDmFb4s8m_1 The limit seems to be 100 * no of cores, so you have to run a lot to reach it but some of you may have hit the limit. The easy way to fix this is to delete all the old slots directories (usually in /var/lib/boinc/slots) - if a directory contains only broken symlinks like this then it is safe to delete /var/lib/boinc/slots/99: total 0 lrwxrwxrwx. 1 boinc boinc 14 Feb 10 02:19 pilot.py -> pilot/pilot.py lrwxrwxrwx. 1 boinc boinc 18 Feb 10 02:19 PILOTVERSION -> pilot/PILOTVERSION lrwxrwxrwx. 1 boinc boinc 20 Feb 10 02:19 RunJobEvent.py -> pilot/RunJobEvent.py lrwxrwxrwx. 1 boinc boinc 15 Feb 10 02:19 RunJob.py -> pilot/RunJob.py lrwxrwxrwx. 1 boinc boinc 15 Feb 10 02:19 VmPeak.py -> pilot/VmPeak.py The problem has been fixed today for new WU so from now on this problem shouldn't happen. |
©2024 CERN