Message boards : Theory Application : Theory Simulation tasks stuck at 100% — docker_wrapper_18 infinite loop on container exit
Message board moderation

To post messages, you must log in.

AuthorMessage
superkali

Send message
Joined: 25 Mar 24
Posts: 7
Credit: 1,216,751
RAC: 838
Message 53372 - Posted: 5 Apr 2026, 20:22:32 UTC

: Kubuntu 24.04, BOINC 8.2.9, docker-ce 29.x, Intel Core Ultra 5 245K (14 cores)
I have been unable to complete any Theory Simulation (docker) tasks due to two related bugs in docker_wrapper_18. Both are reproducible and confirmed across multiple task batches.
Bug 1: Wrapper infinite loop on container exit
When a Theory Simulation container finishes its work and exits, docker_wrapper_18 fails to detect the exit condition. Instead of collecting the output and reporting completion, it enters an infinite loop issuing pause/unpause commands against the dead container:
[pre]
running docker command: pause boinc__lhcathome.cern.ch_lhcathome__theory_2922-XXXXXXX
Error response from daemon: container ... is not running
running docker command: unpause boinc__lhcathome.cern.ch_lhcathome__theory_2922-XXXXXXX
Error response from daemon: Container ... is not paused
[/pre]
This loop continues indefinitely. Tasks show 100% progress and "Running" status in BOINC Manager but never transition to upload state. Confirmed persisting overnight (12+ hours) without resolution.
Bug 2: max_concurrent ignored for Docker tasks
app_config.xml with max_concurrent=1 for the Theory app is completely ignored. BOINC downloads and executes 9-10 Theory tasks simultaneously regardless of this setting. Also confirmed that project_max_concurrent=1 is likewise ignored. Both were verified active via boinccmd --get_app_config (also tested reload via --project update).
With 10 containers competing, BOINC's scheduler thrashes them with rapid pause/unpause cycles, making it impossible for any single wrapper to complete the lighttpd output handshake. This compounds Bug 1.
Additional observation: Containers stuck in Created state
Some containers never progress past "Created" status and are never started by the wrapper, leaving tasks permanently stuck with no CPU activity.
Workaround attempted
Killing the wrapper process manually causes BOINC to immediately report "Computation for task X finished" and clean up correctly. However this requires manual intervention per task and is not sustainable.
Environment note
docker-ce 29.x requires --privileged for tmpfs mounts inside containers (separate issue documented in thread 6470). That workaround is in place and confirmed working — this is a separate issue with the wrapper logic itself.
Has anyone else observed this? Is docker_wrapper_18 actively maintained, and is there a newer version available?
Detailed logs and configuration available at github.com/black-vajra/boinc-devel
ID: 53372 · Report as offensive     Reply Quote
Kamil
New member

Send message
Joined: 6 Apr 26
Posts: 1
Credit: 10,953
RAC: 628
Message 53387 - Posted: 9 Apr 2026, 8:05:58 UTC

I've a similar problem for example on https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=240411576
It went smooth from 0-99% and then it started to go very very slow, now they're on 100% for like 2h and CPU core is on 100%
I had three like that and managed to finish one https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=240414911
and this task took 3-4x the time compared to others
ID: 53387 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 935
Credit: 781,495,911
RAC: 76,972
Message 53398 - Posted: 10 Apr 2026, 18:16:24 UTC - in response to Message 53372.  

The wrapper is part of the main BOINC code base:

https://github.com/BOINC/boinc

Its maintained.
ID: 53398 · Report as offensive     Reply Quote

Message boards : Theory Application : Theory Simulation tasks stuck at 100% — docker_wrapper_18 infinite loop on container exit


©2026 CERN