Questions and Answers : Windows : All vBox WU in error
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2440
Credit: 230,181,191
RAC: 133,482
Message 43988 - Posted: 23 Dec 2020, 10:58:43 UTC

Here are some recent logfiles from Theory VMs.
They provide lots of useful information but they also leave some questions unanswered.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=292597867
https://lhcathome.cern.ch/lhcathome/result.php?resultid=292601738

Examples are taken from resultid=292601738

2020-12-23 07:36:38 (77968): Detected: vboxwrapper 26197
2020-12-23 07:36:38 (77968): Detected: BOINC client v7.7
2020-12-23 07:36:40 (77968): Detected: VirtualBox VboxManage Interface (Version: 6.1.16)
2020-12-23 07:36:40 (77968): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2020-12-23 07:36:40 (77968): Successfully copied 'init_data.xml' to the shared directory.
2020-12-23 07:36:40 (77968): Successfully copied 'input' to the shared directory.
2020-12-23 07:36:42 (77968): Create VM. (boinc_e87c21a86874fb4e, slot#4)
....
2020-12-23 07:38:55 (77968): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
....

Everything looks fine until the last line cited here.


Then a series of pause/resume signal starts:
2020-12-23 07:39:47 (77968): VM state change detected. (old = 'Running', new = 'Paused')
....
2020-12-23 07:42:38 (77968): VM state change detected. (old = 'Paused', new = 'Running')

Those signals are initiated by the BOINC client and can be caused by various reasons, e.g. other projects have higher priority or total RAM setting allowed for BOINC is too low.
CP already made a suggestion what has to be checked/set.
While this happens the watchdog timer set by vboxwrapper is already engaged (see below).


Next lines are from CVMFS inside the VM:
2020-12-23 07:43:31 (77968): Guest Log: 07:42:25 CET +01:00 2020-12-23: cranky: [INFO] Detected Theory App
2020-12-23 07:43:31 (77968): Guest Log: 07:42:25 CET +01:00 2020-12-23: cranky: [INFO] Checking CVMFS.
2020-12-23 07:43:31 (77968): Guest Log: Probing /cvmfs/sft.cern.ch... Failed!
2020-12-23 07:43:31 (77968): Guest Log: 07:42:26 CET +01:00 2020-12-23: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed.

This shows that the VM continues it's setup process as soon as it gets a continue signal but CVMFS fails.
Possible reasons:
- HTTP traffic between the VM and the CVMFS servers are blocked => it's an issue caused by a firewall or a malware protection software
- CVMFS has it's own timeouts => they are overdue caused by too many pause/resumes

Since the setup process needs lots of data to be loaded from the CVMFS servers the VM stalls at this point.

Next series of pause/resume signals follow:
2020-12-23 07:44:28 (77968): VM state change detected. (old = 'Running', new = 'Paused')
....
2020-12-23 08:05:44 (77968): VM state change detected. (old = 'Paused', new = 'Running')


Finally the vboxwrapper watchdog can't find an updated heartbeat file and shuts down the task:
2020-12-23 08:06:02 (77968): VM Heartbeat file specified, but missing.
2020-12-23 08:06:02 (77968): VM Heartbeat file specified, but missing file system status. (errno = '2')

This is the expected behaviour to avoid a crashed VM stays idle forever.


What remains unclear:
Why does the BOINC client send so many pause/resume signals?
Is this caused by some BOINC client settings or is too much RAM allocated by other processes that have nothing to do with BOINC?
Is BOINC or it's child processes blocked by some kind of malware protection?
ID: 43988 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Questions and Answers : Windows : All vBox WU in error


©2024 CERN