All vBox WU in error

Author	Message
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,228,617 RAC: 123,559	Message 43988 - Posted: 23 Dec 2020, 10:58:43 UTC Here are some recent logfiles from Theory VMs. They provide lots of useful information but they also leave some questions unanswered. https://lhcathome.cern.ch/lhcathome/result.php?resultid=292597867 https://lhcathome.cern.ch/lhcathome/result.php?resultid=292601738 Examples are taken from resultid=292601738 2020-12-23 07:36:38 (77968): Detected: vboxwrapper 26197 2020-12-23 07:36:38 (77968): Detected: BOINC client v7.7 2020-12-23 07:36:40 (77968): Detected: VirtualBox VboxManage Interface (Version: 6.1.16) 2020-12-23 07:36:40 (77968): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2020-12-23 07:36:40 (77968): Successfully copied 'init_data.xml' to the shared directory. 2020-12-23 07:36:40 (77968): Successfully copied 'input' to the shared directory. 2020-12-23 07:36:42 (77968): Create VM. (boinc_e87c21a86874fb4e, slot#4) .... 2020-12-23 07:38:55 (77968): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) .... Everything looks fine until the last line cited here. Then a series of pause/resume signal starts: 2020-12-23 07:39:47 (77968): VM state change detected. (old = 'Running', new = 'Paused') .... 2020-12-23 07:42:38 (77968): VM state change detected. (old = 'Paused', new = 'Running') Those signals are initiated by the BOINC client and can be caused by various reasons, e.g. other projects have higher priority or total RAM setting allowed for BOINC is too low. CP already made a suggestion what has to be checked/set. While this happens the watchdog timer set by vboxwrapper is already engaged (see below). Next lines are from CVMFS inside the VM: 2020-12-23 07:43:31 (77968): Guest Log: 07:42:25 CET +01:00 2020-12-23: cranky: [INFO] Detected Theory App 2020-12-23 07:43:31 (77968): Guest Log: 07:42:25 CET +01:00 2020-12-23: cranky: [INFO] Checking CVMFS. 2020-12-23 07:43:31 (77968): Guest Log: Probing /cvmfs/sft.cern.ch... Failed! 2020-12-23 07:43:31 (77968): Guest Log: 07:42:26 CET +01:00 2020-12-23: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed. This shows that the VM continues it's setup process as soon as it gets a continue signal but CVMFS fails. Possible reasons: - HTTP traffic between the VM and the CVMFS servers are blocked => it's an issue caused by a firewall or a malware protection software - CVMFS has it's own timeouts => they are overdue caused by too many pause/resumes Since the setup process needs lots of data to be loaded from the CVMFS servers the VM stalls at this point. Next series of pause/resume signals follow: 2020-12-23 07:44:28 (77968): VM state change detected. (old = 'Running', new = 'Paused') .... 2020-12-23 08:05:44 (77968): VM state change detected. (old = 'Paused', new = 'Running') Finally the vboxwrapper watchdog can't find an updated heartbeat file and shuts down the task: 2020-12-23 08:06:02 (77968): VM Heartbeat file specified, but missing. 2020-12-23 08:06:02 (77968): VM Heartbeat file specified, but missing file system status. (errno = '2') This is the expected behaviour to avoid a crashed VM stays idle forever. What remains unclear: Why does the BOINC client send so many pause/resume signals? Is this caused by some BOINC client settings or is too much RAM allocated by other processes that have nothing to do with BOINC? Is BOINC or it's child processes blocked by some kind of malware protection? ID: 43988 · Reply Quote

LHC@home