Message boards : Theory Application : Some Theory tasks on VirtualBox hang Probing /cvmfs/alice.cern.ch...
Message board moderation

To post messages, you must log in.

AuthorMessage
Glohr

Send message
Joined: 13 Jan 24
Posts: 48
Credit: 9,505,396
RAC: 17,886
Message 51818 - Posted: 6 Apr 2025, 23:22:16 UTC

Some Theory tasks hang with the last thing on the screen "Probing /cvmfs/alice.cern.ch... "

Most tasks continue on and eventually exit normally, but some just sit there never getting the "OK" and so on. The problem tasks don't accumulate any Guest CPU time in VirtualBox after the initial phase. The VM continues to accumulate a few seconds of CPU time per hour, apparently for housekeeping.

Today, all the running.log files in problem tasks visible through the Web application end with
INFO: index summary: size / path
8 /scratch/dat/index/pp_13000_jets_280_-_pythia8_8.306_eetherm.txt

Disk usage: 6792 Kb

CPU usage: 2185 s

Clean tmp ...

Run finished successfully
and contain a line
/cvmfs/sft.cern.ch/lcg/releases/LCG_88b/MCGenerators/pythia8/306/x86_64-centos7-gcc62-opt/pythia8env-genser.sh: line 49: python: command not found

All the successful tasks that I checked were in a different tree and were using CPU time while processing events until they finished and exited.
In the past I've let a couple of the problem tasks go until they timed out after 10 days and exited with "Error while computing" status. That seems rather pointless, so I've been aborting any that I notice going nowhere rather than leaving the dog in the manger blocking other work.

Here are a couple of the problem tasks:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421021497
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421076672

Is there any way to avoid these, or at least kill them off quickly, automatically?
ID: 51818 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 48
Credit: 9,505,396
RAC: 17,886
Message 51829 - Posted: 8 Apr 2025, 10:29:51 UTC - in response to Message 51818.  

All of the problem tasks have a running.log that begins with the same line with the same old timestamp:
===> [runRivet] Wed Apr 2 05:28:25 PM UTC 2025 [boinc pp jets 13000 280 - pythia8 8.306 eetherm 100000 42]
I've killed off a couple more today.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421143601
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421141211
ID: 51829 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1556
Credit: 10,100,748
RAC: 1,717
Message 51831 - Posted: 8 Apr 2025, 15:55:21 UTC - in response to Message 51829.  
Last modified: 8 Apr 2025, 15:56:35 UTC

All of the problem tasks have a running.log that begins with the same line with the same old timestamp:
===> [runRivet] Wed Apr 2 05:28:25 PM UTC 2025 [boinc pp jets 13000 280 - pythia8 8.306 eetherm 100000 42]
I've killed off a couple more today.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421143601
https://lhcathome.cern.ch/lhcathome/result.php?resultid=421141211

That is very weird. It looks like you're not using the default vdi, when different tasks come with the same job desciption (remnant of an old log?).
You could consider to reset the LHC project in your BOINC Manager.
ID: 51831 · Report as offensive     Reply Quote
CloverField

Send message
Joined: 17 Oct 06
Posts: 99
Credit: 65,524,043
RAC: 13,662
Message 51833 - Posted: 13 Apr 2025, 17:57:02 UTC

This happened to me as well but a project reset appears to have fixed it.
ID: 51833 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 116
Credit: 9,195,248
RAC: 5,946
Message 53367 - Posted: 4 Apr 2026, 8:39:44 UTC

Pythia 8...

So far, I think I caught and smashed 2 zombie tasks with a brick.

There were two symptoms:
1. The log completely stopped moving.
2. In the task's VM activity, I saw that VMM is using 70% CPU and the Guest only 5% (on a normally working machine, those numbers are 5% and 95%).

Both logs of these zombies (as I believe them to be) were identical, stuck exactly between these lines:
AlmaLinux release 9.6 (Sage Margay)
...
envscript=/cvmfs/sft.cern.ch/lcg/releases/LCG_96/MCGenerators/pythia8/301/x86_64-centos7-gcc8-opt/pythia8env-genser.sh

For now, this is reptiloid language to me: I know absolutely nothing about Alma Linux 9 and "7 cents". And I never drank beer with Mr. Genser... :)
ID: 53367 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2753
Credit: 304,110,606
RAC: 113,179
Message 53368 - Posted: 4 Apr 2026, 11:59:39 UTC - in response to Message 53367.  

In reply to metalius's message of 4 Apr 2026:
Pythia 8...

So far, I think I caught and smashed 2 zombie tasks with a brick.

There were two symptoms:
1. The log completely stopped moving.
2. In the task's VM activity, I saw that VMM is using 70% CPU and the Guest only 5% (on a normally working machine, those numbers are 5% and 95%).

Both logs of these zombies (as I believe them to be) were identical, stuck exactly between these lines:
AlmaLinux release 9.6 (Sage Margay)
...
envscript=/cvmfs/sft.cern.ch/lcg/releases/LCG_96/MCGenerators/pythia8/301/x86_64-centos7-gcc8-opt/pythia8env-genser.sh

For now, this is reptiloid language to me: I know absolutely nothing about Alma Linux 9 and "7 cents". And I never drank beer with Mr. Genser... :)

This is running inside the VMs.
Since you can't influence it, it doesn't make sense to dig deeper now.

Instead

You are running VirtualBox 6.1.34/7.0.6 which are both out of maintenance for at least a year.
Please consider to upgrade to the most recent version, currently 7.2.26.
Before you run the upgrade, finish all work in progress and ensure no VM is running or in saved state.


In addition

Your computers report only 8/16 GB RAM.
Ensure you do not overcommit memory usage.
Whenever your computers suspend/resume a VM this is an extremely heavy task.
When you notice lots of this in stderr.txt consider to reduce the #tasks/projects you run concurrently.
ID: 53368 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 116
Credit: 9,195,248
RAC: 5,946
Message 53369 - Posted: 4 Apr 2026, 13:36:34 UTC - in response to Message 53368.  
Last modified: 4 Apr 2026, 13:43:30 UTC

In reply to computezrmle's message of 4 Apr 2026:
Please consider to upgrade to the most recent version, currently 7.2.26.

Believe or not, but VBox 6 looks more stable to me NOW than 7.
6.1.34 never went crazy on my work PC, which is used for real everyday jobs.
At the same time, 7.0.6 loses its mind about once a week and starts mass-generating "Computation Error". And this happens on a PC that does absolutely nothing except processing for BOINC.
IMHO, the safest way to not smash my keyboard is to install a combined package. I have BOINC 8.0.2 + VBox in my archive. I will try it, thank You for this idea!
ID: 53369 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 116
Credit: 9,195,248
RAC: 5,946
Message 53380 - Posted: 8 Apr 2026, 8:46:07 UTC
Last modified: 8 Apr 2026, 8:46:46 UTC

MadGraph5

These types of tasks are rare, but I see the same scenario on one of my PCs.
1. For about half an hour, there is some activity in the VM: Guest shows some CPU load, it downloads around 200 MB, and writes about 100 MB to the disk.
2. After that, Guest activity drops to zero, all performance graphs are flat, and the task log file becomes unreachable.

Here is an excerpt from the log:
===> [runRivet] Wed Apr  8 07:03:39 AM UTC 2026 [boinc pp zinclusive 13000 -,-,200 - madgraph5amc 2.7.2.atlas3...]
...
ValueError: unsupported hash type md5
AttributeError : 'module' object has no attribute 'md5'
...
ERROR: missing LHE output file: /scratch/tmp/tmp.UjcDJkl2wh/MG5RUN/Events/run_01/unweighted_events.lhe
...
[2]+  2535 Running                 ( $rivetExecString; exit $? ) &
ERROR: fail to run madgraph5amc 2.7.2.atlas3 or Rivet (error exit code)

My preliminary, but not yet firm conclusion: The simulation never started because the angry Python strangled everything, maybe even itself. :)
My action: Brick therapy for that VM.

P.S. For those who are experienced or have real knowledge – please comment, is this "Brick therapy" a correct solution?
ID: 53380 · Report as offensive     Reply Quote

Message boards : Theory Application : Some Theory tasks on VirtualBox hang Probing /cvmfs/alice.cern.ch...


©2026 CERN