Message boards : ATLAS application : Zombie Thread after the Workunit is finished.
Message board moderation

To post messages, you must log in.

AuthorMessage
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,234,904
RAC: 12,546
Message 49229 - Posted: 22 Jan 2024, 19:13:41 UTC
Last modified: 22 Jan 2024, 19:19:54 UTC

I have the problem that Zombie Threads are still running after the Workunit has finished and the slot of the workunit is already deleted. It very likely started this weekend.

In htop i see them as /bin/bash ./runpilot2-wrapper.sh -q BOINC_MCORE -j managed --pilot-user ATLAS --harvester-submit-mode PUSH -w generic --job-type managed --resource-type SCORE_HIMEM --pilotversion 3.7.0.36 -z -t --piloturl local --mute --container

They take ~40% cputime per workunit so it takes up quite some CPU time after a few hours and i need to restart boinc to get rid of them
They gets started around 6-8minutes after start of the Workunit. Before the python runargs.EVNTtoHITS.py starts to use the CPU.

Could this be something that is broken with the current batch(the last 2 days) or could it be something that is broken by an Arch linux update or caused by my currently unstable internet.

Has anyone else the same problem?
ID: 49229 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 36
Credit: 117,967,568
RAC: 128,018
Message 49230 - Posted: 23 Jan 2024, 3:55:44 UTC - in response to Message 49229.  

+1
ID: 49230 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 36
Credit: 117,967,568
RAC: 128,018
Message 49232 - Posted: 23 Jan 2024, 6:25:04 UTC - in response to Message 49229.  

I see this in Manjaro Linux. (A modification of arch)
Tested kernel 6.6 and 6.7. Same result.
ID: 49232 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,234,904
RAC: 12,546
Message 49265 - Posted: 25 Jan 2024, 13:47:43 UTC

Seems like the zombie Thread problem is gone on my device.
Somewhere around 22 Jan 2024, 16:58:50 UTC(time where the task was sent) and 24 Jan 2024, 17:28:26 UTC(where the task did run) was it fixed.
ID: 49265 · Report as offensive     Reply Quote

Message boards : ATLAS application : Zombie Thread after the Workunit is finished.


©2024 CERN