Thread 'Some Theory tasks on VirtualBox hang Probing /cvmfs/alice.cern.ch...'

Author	Message
Glohr Send message Joined: 13 Jan 24 Posts: 48 Credit: 9,505,396 RAC: 17,886	Message 51818 - Posted: 6 Apr 2025, 23:22:16 UTC Some Theory tasks hang with the last thing on the screen "Probing /cvmfs/alice.cern.ch... " Most tasks continue on and eventually exit normally, but some just sit there never getting the "OK" and so on. The problem tasks don't accumulate any Guest CPU time in VirtualBox after the initial phase. The VM continues to accumulate a few seconds of CPU time per hour, apparently for housekeeping. Today, all the running.log files in problem tasks visible through the Web application end with INFO: index summary: size / path 8 /scratch/dat/index/pp_13000_jets_280_-_pythia8_8.306_eetherm.txt Disk usage: 6792 Kb CPU usage: 2185 s Clean tmp ... Run finished successfully and contain a line /cvmfs/sft.cern.ch/lcg/releases/LCG_88b/MCGenerators/pythia8/306/x86_64-centos7-gcc62-opt/pythia8env-genser.sh: line 49: python: command not found All the successful tasks that I checked were in a different tree and were using CPU time while processing events until they finished and exited. In the past I've let a couple of the problem tasks go until they timed out after 10 days and exited with "Error while computing" status. That seems rather pointless, so I've been aborting any that I notice going nowhere rather than leaving the dog in the manger blocking other work. Here are a couple of the problem tasks: https://lhcathome.cern.ch/lhcathome/result.php?resultid=421021497 https://lhcathome.cern.ch/lhcathome/result.php?resultid=421076672 Is there any way to avoid these, or at least kill them off quickly, automatically? ID: 51818 · Reply Quote

Glohr Send message Joined: 13 Jan 24 Posts: 48 Credit: 9,505,396 RAC: 17,886	Message 51829 - Posted: 8 Apr 2025, 10:29:51 UTC - in response to Message 51818. All of the problem tasks have a running.log that begins with the same line with the same old timestamp: ===> [runRivet] Wed Apr 2 05:28:25 PM UTC 2025 [boinc pp jets 13000 280 - pythia8 8.306 eetherm 100000 42] I've killed off a couple more today. https://lhcathome.cern.ch/lhcathome/result.php?resultid=421143601 https://lhcathome.cern.ch/lhcathome/result.php?resultid=421141211 ID: 51829 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,100,748 RAC: 1,717	Message 51831 - Posted: 8 Apr 2025, 15:55:21 UTC - in response to Message 51829. Last modified: 8 Apr 2025, 15:56:35 UTC All of the problem tasks have a running.log that begins with the same line with the same old timestamp: ===> [runRivet] Wed Apr 2 05:28:25 PM UTC 2025 [boinc pp jets 13000 280 - pythia8 8.306 eetherm 100000 42] I've killed off a couple more today. https://lhcathome.cern.ch/lhcathome/result.php?resultid=421143601 https://lhcathome.cern.ch/lhcathome/result.php?resultid=421141211 That is very weird. It looks like you're not using the default vdi, when different tasks come with the same job desciption (remnant of an old log?). You could consider to reset the LHC project in your BOINC Manager. ID: 51831 · Reply Quote

CloverField Send message Joined: 17 Oct 06 Posts: 99 Credit: 65,524,043 RAC: 13,662	Message 51833 - Posted: 13 Apr 2025, 17:57:02 UTC This happened to me as well but a project reset appears to have fixed it. ID: 51833 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 116 Credit: 9,195,248 RAC: 5,946	Message 53367 - Posted: 4 Apr 2026, 8:39:44 UTC Pythia 8... So far, I think I caught and smashed 2 zombie tasks with a brick. There were two symptoms: 1. The log completely stopped moving. 2. In the task's VM activity, I saw that VMM is using 70% CPU and the Guest only 5% (on a normally working machine, those numbers are 5% and 95%). Both logs of these zombies (as I believe them to be) were identical, stuck exactly between these lines: AlmaLinux release 9.6 (Sage Margay) ... envscript=/cvmfs/sft.cern.ch/lcg/releases/LCG_96/MCGenerators/pythia8/301/x86_64-centos7-gcc8-opt/pythia8env-genser.sh For now, this is reptiloid language to me: I know absolutely nothing about Alma Linux 9 and "7 cents". And I never drank beer with Mr. Genser... :) ID: 53367 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2753 Credit: 304,110,606 RAC: 113,179	Message 53368 - Posted: 4 Apr 2026, 11:59:39 UTC - in response to Message 53367. In reply to metalius's message of 4 Apr 2026: Pythia 8... So far, I think I caught and smashed 2 zombie tasks with a brick. There were two symptoms: 1. The log completely stopped moving. 2. In the task's VM activity, I saw that VMM is using 70% CPU and the Guest only 5% (on a normally working machine, those numbers are 5% and 95%). Both logs of these zombies (as I believe them to be) were identical, stuck exactly between these lines: AlmaLinux release 9.6 (Sage Margay) ... envscript=/cvmfs/sft.cern.ch/lcg/releases/LCG_96/MCGenerators/pythia8/301/x86_64-centos7-gcc8-opt/pythia8env-genser.sh For now, this is reptiloid language to me: I know absolutely nothing about Alma Linux 9 and "7 cents". And I never drank beer with Mr. Genser... :) This is running inside the VMs. Since you can't influence it, it doesn't make sense to dig deeper now. Instead You are running VirtualBox 6.1.34/7.0.6 which are both out of maintenance for at least a year. Please consider to upgrade to the most recent version, currently 7.2.26. Before you run the upgrade, finish all work in progress and ensure no VM is running or in saved state. In addition Your computers report only 8/16 GB RAM. Ensure you do not overcommit memory usage. Whenever your computers suspend/resume a VM this is an extremely heavy task. When you notice lots of this in stderr.txt consider to reduce the #tasks/projects you run concurrently. ID: 53368 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 116 Credit: 9,195,248 RAC: 5,946	Message 53369 - Posted: 4 Apr 2026, 13:36:34 UTC - in response to Message 53368. Last modified: 4 Apr 2026, 13:43:30 UTC In reply to computezrmle's message of 4 Apr 2026: Please consider to upgrade to the most recent version, currently 7.2.26. Believe or not, but VBox 6 looks more stable to me NOW than 7. 6.1.34 never went crazy on my work PC, which is used for real everyday jobs. At the same time, 7.0.6 loses its mind about once a week and starts mass-generating "Computation Error". And this happens on a PC that does absolutely nothing except processing for BOINC. IMHO, the safest way to not smash my keyboard is to install a combined package. I have BOINC 8.0.2 + VBox in my archive. I will try it, thank You for this idea! ID: 53369 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 116 Credit: 9,195,248 RAC: 5,946	Message 53380 - Posted: 8 Apr 2026, 8:46:07 UTC Last modified: 8 Apr 2026, 8:46:46 UTC MadGraph5 These types of tasks are rare, but I see the same scenario on one of my PCs. 1. For about half an hour, there is some activity in the VM: Guest shows some CPU load, it downloads around 200 MB, and writes about 100 MB to the disk. 2. After that, Guest activity drops to zero, all performance graphs are flat, and the task log file becomes unreachable. Here is an excerpt from the log: ===> [runRivet] Wed Apr 8 07:03:39 AM UTC 2026 [boinc pp zinclusive 13000 -,-,200 - madgraph5amc 2.7.2.atlas3...] ... ValueError: unsupported hash type md5 AttributeError : 'module' object has no attribute 'md5' ... ERROR: missing LHE output file: /scratch/tmp/tmp.UjcDJkl2wh/MG5RUN/Events/run_01/unweighted_events.lhe ... [2]+ 2535 Running ( $rivetExecString; exit $? ) & ERROR: fail to run madgraph5amc 2.7.2.atlas3 or Rivet (error exit code) My preliminary, but not yet firm conclusion: The simulation never started because the angry Python strangled everything, maybe even itself. :) My action: Brick therapy for that VM. P.S. For those who are experienced or have real knowledge – please comment, is this "Brick therapy" a correct solution? ID: 53380 · Reply Quote