Message boards : Theory Application : Theory Simulation tasks stuck at 100% — docker_wrapper_18 infinite loop on container exit
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 25 Mar 24 Posts: 7 Credit: 1,216,751 RAC: 838 |
: Kubuntu 24.04, BOINC 8.2.9, docker-ce 29.x, Intel Core Ultra 5 245K (14 cores) I have been unable to complete any Theory Simulation (docker) tasks due to two related bugs in docker_wrapper_18. Both are reproducible and confirmed across multiple task batches. Bug 1: Wrapper infinite loop on container exit When a Theory Simulation container finishes its work and exits, docker_wrapper_18 fails to detect the exit condition. Instead of collecting the output and reporting completion, it enters an infinite loop issuing pause/unpause commands against the dead container: [pre] running docker command: pause boinc__lhcathome.cern.ch_lhcathome__theory_2922-XXXXXXX Error response from daemon: container ... is not running running docker command: unpause boinc__lhcathome.cern.ch_lhcathome__theory_2922-XXXXXXX Error response from daemon: Container ... is not paused [/pre] This loop continues indefinitely. Tasks show 100% progress and "Running" status in BOINC Manager but never transition to upload state. Confirmed persisting overnight (12+ hours) without resolution. Bug 2: max_concurrent ignored for Docker tasks app_config.xml with max_concurrent=1 for the Theory app is completely ignored. BOINC downloads and executes 9-10 Theory tasks simultaneously regardless of this setting. Also confirmed that project_max_concurrent=1 is likewise ignored. Both were verified active via boinccmd --get_app_config (also tested reload via --project update). With 10 containers competing, BOINC's scheduler thrashes them with rapid pause/unpause cycles, making it impossible for any single wrapper to complete the lighttpd output handshake. This compounds Bug 1. Additional observation: Containers stuck in Created state Some containers never progress past "Created" status and are never started by the wrapper, leaving tasks permanently stuck with no CPU activity. Workaround attempted Killing the wrapper process manually causes BOINC to immediately report "Computation for task X finished" and clean up correctly. However this requires manual intervention per task and is not sustainable. Environment note docker-ce 29.x requires --privileged for tmpfs mounts inside containers (separate issue documented in thread 6470). That workaround is in place and confirmed working — this is a separate issue with the wrapper logic itself. Has anyone else observed this? Is docker_wrapper_18 actively maintained, and is there a newer version available? Detailed logs and configuration available at github.com/black-vajra/boinc-devel |
|
New member Send message Joined: 6 Apr 26 Posts: 1 Credit: 10,953 RAC: 628 |
I've a similar problem for example on https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=240411576 It went smooth from 0-99% and then it started to go very very slow, now they're on 100% for like 2h and CPU core is on 100% I had three like that and managed to finish one https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=240414911 and this task took 3-4x the time compared to others |
|
Send message Joined: 27 Sep 08 Posts: 935 Credit: 781,495,911 RAC: 76,972 |
|
©2026 CERN