Message boards : CMS Application : Why all CMS virtualbox task were all failed... host was damaged?
Message board moderation

To post messages, you must log in.

AuthorMessage
Cyanr & Cinny
Avatar

Send message
Joined: 1 Jul 20
Posts: 7
Credit: 20,906,498
RAC: 8,902
Message 51325 - Posted: 25 Dec 2024, 14:56:43 UTC

Hi experts

Recently, all my CMS vbox64 tasks were all failed and boinc log shows:
Task CMS_xxxx postponed for 86400 seconds: VM Hypervisor failed to enter an online state in a timely fashion.
Does this mean my host CPU is damaged and all vbox64 function is disabled or failed?

Thanks for any clues
ID: 51325 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 66
Credit: 25,047,948
RAC: 35,030
Message 51327 - Posted: 25 Dec 2024, 16:32:27 UTC

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6265&postid=51310#51310

There is no point running CMS atm as no work s being sent to the VM Task.
ID: 51327 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 96
Credit: 16,671,202
RAC: 2,547
Message 51328 - Posted: 26 Dec 2024, 0:11:19 UTC - in response to Message 51325.  

Hi experts

Recently, all my CMS vbox64 tasks were all failed and boinc log shows:
Task CMS_xxxx postponed for 86400 seconds: VM Hypervisor failed to enter an online state in a timely fashion.
Does this mean my host CPU is damaged and all vbox64 function is disabled or failed?

Thanks for any clues

There is nothing wrong with your system. The CMS tasks simply are not working right now. Everyone is having the same problem.

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6265&postid=51310#51310

There is no point running CMS atm as no work s being sent to the VM Task.

Wrong.
ID: 51328 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,011,964
RAC: 1,858
Message 51333 - Posted: 27 Dec 2024, 23:10:48 UTC

87 CMS tasks error out.
That's messed up.

208 (0x000000D0) EXIT_SUB_TASK_FAILURE

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified.
(0xd0) - exit code 208 (0xd0)</message>
<stderr_txt>
2024-12-27 17:50:16 (20116): vboxwrapper version 26207
2024-12-27 17:50:16 (20116): BOINC client version: 8.0.2
2024-12-27 17:50:17 (20116): Detected: VirtualBox VboxManage Interface (Version: 7.1.4)
2024-12-27 17:50:17 (20116): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2024-12-27 17:50:17 (20116): Successfully copied 'init_data.xml' to the shared directory.
2024-12-27 17:50:17 (20116): Create VM. (boinc_ed3bbea4e75796fa, slot#2)
2024-12-27 17:50:18 (20116): Setting Memory Size for VM. (4584MB)
2024-12-27 17:50:18 (20116): Setting CPU Count for VM. (4)
2024-12-27 17:50:19 (20116): Setting Chipset Options for VM.
2024-12-27 17:50:19 (20116): Setting Graphics Controller Options for VM.
2024-12-27 17:50:19 (20116): Setting Boot Options for VM.
2024-12-27 17:50:19 (20116): Setting Network Configuration for NAT.
2024-12-27 17:50:20 (20116): Enabling VM Network Access.
2024-12-27 17:50:20 (20116): Disabling USB Support for VM.
2024-12-27 17:50:20 (20116): Disabling COM Port Support for VM.
2024-12-27 17:50:20 (20116): Disabling LPT Port Support for VM.
2024-12-27 17:50:21 (20116): Disabling Audio Support for VM.
2024-12-27 17:50:21 (20116): Disabling Clipboard Support for VM.
2024-12-27 17:50:21 (20116): Disabling Drag and Drop Support for VM.
2024-12-27 17:50:21 (20116): Adding storage controller(s) to VM.
2024-12-27 17:50:22 (20116): Adding virtual disk drive to VM. (CMS_2024_04_29_prod.vdi)
2024-12-27 17:50:24 (20116): Adding VirtualBox Guest Additions to VM.
2024-12-27 17:50:24 (20116): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2024-12-27 17:50:24 (20116): forwarding host port 56025 to guest port 80
2024-12-27 17:50:24 (20116): Enabling remote desktop for VM.
2024-12-27 17:50:25 (20116): Required extension pack not installed, remote desktop not enabled.
2024-12-27 17:50:25 (20116): Enabling shared directory for VM.
2024-12-27 17:50:25 (20116): Starting VM using VBoxManage interface. (boinc_ed3bbea4e75796fa, slot#2)
2024-12-27 17:50:31 (20116): Successfully started VM. (PID = '18128')
2024-12-27 17:50:31 (20116): Reporting VM Process ID to BOINC.
2024-12-27 17:50:31 (20116): Guest Log: BIOS: VirtualBox 7.1.4
2024-12-27 17:50:31 (20116): Guest Log: CPUID EDX: 0x178bfbff
2024-12-27 17:50:31 (20116): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2024-12-27 17:50:31 (20116): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2024-12-27 17:50:31 (20116): VM state change detected. (old = 'poweredoff', new = 'running')
2024-12-27 17:50:31 (20116): Detected: Web Application Enabled (http://localhost:56025)
2024-12-27 17:50:31 (20116): Preference change detected
2024-12-27 17:50:31 (20116): Setting CPU throttle for VM. (97%)
2024-12-27 17:50:31 (20116): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2024-12-27 17:50:33 (20116): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2024-12-27 17:50:33 (20116): Guest Log: BIOS: Booting from Hard Disk...
2024-12-27 17:50:35 (20116): Guest Log: BIOS: KBD: unsupported int 16h function 03
2024-12-27 17:50:35 (20116): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2024-12-27 17:51:04 (20116): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2024-12-27 17:51:04 (20116): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2024-12-27 17:51:04 (20116): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000112 main Log opened 2024-12-27T16:51:04.386249000Z
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000210 main OS Product: Linux
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000230 main OS Release: 4.14.232-19.cernvm.x86_64
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000248 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000296 main Executable: /usr/sbin/VBoxService
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000296 main Process ID: 2287
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.000297 main Package type: LINUX_64BITS_GENERIC
2024-12-27 17:51:04 (20116): Guest Log: 00:00:00.001564 main 5.2.6 r120293 started. Verbose level = 0
2024-12-27 17:51:12 (20116): Guest Log: [INFO] Mounting the shared directory
2024-12-27 17:51:12 (20116): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2024-12-27 17:51:13 (20116): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2024-12-27 17:51:13 (20116): Guest Log: [INFO] Testing connection to cern.ch
2024-12-27 17:51:13 (20116): Guest Log: [INFO] Testing connection to VCCS
2024-12-27 17:51:13 (20116): Guest Log: [INFO] Testing connection to HTCondor
2024-12-27 17:51:13 (20116): Guest Log: [INFO] Testing connection to WMAgent
2024-12-27 17:51:14 (20116): Guest Log: [INFO] Testing connection to EOSCMS
2024-12-27 17:51:14 (20116): Guest Log: [INFO] Testing connection to CMS-Factory
2024-12-27 17:51:14 (20116): Guest Log: [INFO] Testing connection to CMS-Frontier
2024-12-27 17:51:15 (20116): Guest Log: [INFO] Testing connection to Frontier
2024-12-27 17:51:16 (20116): Guest Log: [INFO] Could not find a local HTTP proxy
2024-12-27 17:51:16 (20116): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections
2024-12-27 17:51:16 (20116): Guest Log: [INFO] This makes the application less efficient
2024-12-27 17:51:16 (20116): Guest Log: [INFO] It also puts higher load on the project servers
2024-12-27 17:51:16 (20116): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended
2024-12-27 17:51:16 (20116): Guest Log: [INFO] Advice can be found in the project forum
2024-12-27 17:51:16 (20116): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2024-12-27 17:51:20 (20116): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2024-12-27 17:51:21 (20116): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2024-12-27 17:51:22 (20116): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2024-12-27 17:51:22 (20116): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2024-12-27 17:51:22 (20116): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2024-12-27 17:51:23 (20116): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2024-12-27 17:51:23 (20116): Guest Log: [INFO] 2.7.2.0 http://s1cern-cvmfs.openhtc.io DIRECT
2024-12-27 17:51:23 (20116): Guest Log: [INFO] Environment HTTP proxy: not set
2024-12-27 17:51:23 (20116): Guest Log: [INFO] Reading volunteer information
2024-12-27 17:51:28 (20116): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2024-12-27 17:51:29 (20116): Guest Log: [INFO] Requesting an idtoken from LHC@home
2024-12-27 17:51:30 (20116): Guest Log: [INFO] CMS application starting. Check log files.
2024-12-27 18:11:33 (20116): Guest Log: [ERROR] glidein exited with return value 1.
2024-12-27 18:11:33 (20116): Guest Log: [DEBUG] Volunteer: greg_be (141739)
2024-12-27 18:11:33 (20116): Guest Log: [INFO] Shutting Down.
2024-12-27 18:12:03 (20116): VM Completion File Detected.
2024-12-27 18:12:03 (20116): VM Completion Message: glidein exited with return value 1.
.
2024-12-27 18:12:03 (20116): Powering off VM.
2024-12-28 00:04:29 (16920): vboxwrapper version 26207
2024-12-28 00:04:29 (16920): BOINC client version: 8.0.2
2024-12-28 00:04:30 (16920): Detected: VirtualBox VboxManage Interface (Version: 7.1.4)
2024-12-28 00:04:30 (16920): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2024-12-28 00:04:31 (16920): Guest Log: BIOS: VirtualBox 7.1.4
2024-12-28 00:04:31 (16920): Guest Log: CPUID EDX: 0x178bfbff
2024-12-28 00:04:31 (16920): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2024-12-28 00:04:31 (16920): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2024-12-28 00:04:31 (16920): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2024-12-28 00:04:31 (16920): Guest Log: BIOS: Booting from Hard Disk...
2024-12-28 00:04:31 (16920): Guest Log: BIOS: KBD: unsupported int 16h function 03
2024-12-28 00:04:31 (16920): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2024-12-28 00:04:31 (16920): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2024-12-28 00:04:31 (16920): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2024-12-28 00:04:31 (16920): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000112 main Log opened 2024-12-27T16:51:04.386249000Z
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000210 main OS Product: Linux
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000230 main OS Release: 4.14.232-19.cernvm.x86_64
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000248 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000296 main Executable: /usr/sbin/VBoxService
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000296 main Process ID: 2287
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.000297 main Package type: LINUX_64BITS_GENERIC
2024-12-28 00:04:31 (16920): Guest Log: 00:00:00.001564 main 5.2.6 r120293 started. Verbose level = 0
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Mounting the shared directory
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to cern.ch
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to VCCS
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to HTCondor
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to WMAgent
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to EOSCMS
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to CMS-Factory
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to CMS-Frontier
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Testing connection to Frontier
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Could not find a local HTTP proxy
2024-12-28 00:04:31 (16920): Guest Log: [INFO] CVMFS and Frontier will have to use DIRECT connections
2024-12-28 00:04:31 (16920): Guest Log: [INFO] This makes the application less efficient
2024-12-28 00:04:31 (16920): Guest Log: [INFO] It also puts higher load on the project servers
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Setting up a local HTTP proxy is highly recommended
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Advice can be found in the project forum
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2024-12-28 00:04:31 (16920): Guest Log: [INFO] 2.7.2.0 http://s1cern-cvmfs.openhtc.io DIRECT
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Environment HTTP proxy: not set
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Reading volunteer information
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Requesting an idtoken from LHC@home
2024-12-28 00:04:31 (16920): Guest Log: [INFO] CMS application starting. Check log files.
2024-12-28 00:04:31 (16920): Guest Log: [ERROR] glidein exited with return value 1.
2024-12-28 00:04:31 (16920): Guest Log: [DEBUG] Volunteer: greg_be (141739)
2024-12-28 00:04:31 (16920): Guest Log: [INFO] Shutting Down.
2024-12-28 00:04:31 (16920): Starting VM using VBoxManage interface. (boinc_ed3bbea4e75796fa, slot#2)
2024-12-28 00:04:36 (16920): Successfully started VM. (PID = '17240')
2024-12-28 00:04:36 (16920): Reporting VM Process ID to BOINC.
2024-12-28 00:04:36 (16920): Guest Log: BIOS: VirtualBox 7.1.4
2024-12-28 00:04:36 (16920): Guest Log: CPUID EDX: 0x178bfbff
2024-12-28 00:04:36 (16920): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2024-12-28 00:04:36 (16920): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2024-12-28 00:04:36 (16920): VM state change detected. (old = 'poweredoff', new = 'running')
2024-12-28 00:04:36 (16920): Detected: Web Application Enabled (http://localhost:56025)
2024-12-28 00:04:36 (16920): VM Completion File Detected.
2024-12-28 00:04:36 (16920): VM Completion Message: glidein exited with return value 1.
.
2024-12-28 00:04:36 (16920): Powering off VM.
2024-12-28 00:04:37 (16920): Successfully stopped VM.
2024-12-28 00:04:37 (16920): Deregistering VM. (boinc_ed3bbea4e75796fa, slot#2)
2024-12-28 00:04:37 (16920): Removing network bandwidth throttle group from VM.
2024-12-28 00:04:37 (16920): Removing VM from VirtualBox.
2024-12-28 00:04:42 (16920): called boinc_finish(208)

</stderr_txt>
]]>

They all say the same thing.
I aborted the last one in my queue and disabled CMS for the time being.

Gees...seriously....
ID: 51333 · Report as offensive     Reply Quote
Profile Guy
Avatar

Send message
Joined: 9 Feb 08
Posts: 55
Credit: 1,528,489
RAC: 2,661
Message 51336 - Posted: 28 Dec 2024, 3:06:09 UTC - in response to Message 51333.  
Last modified: 28 Dec 2024, 3:11:00 UTC

Yes. All CMS tasks are "empty" units at the moment. Despite this, they are still being distributed and are assigned 4 (or more) CPUs on your BOINC host PC and run from anything between a couple of minutes to about half an hour. It's been like that for some weeks! The LHC "system" is being reconfigured - an extensive operation. It is speculated that these "void-of-data" workunits are providing some sort of feedback to the system engineers and are useful in that way. There's just no actual number crunching as such.
See this thread for the latest information about the progress being made.
ID: 51336 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 341
Credit: 5,011,964
RAC: 1,858
Message 51337 - Posted: 28 Dec 2024, 9:35:22 UTC - in response to Message 51336.  

Yes. All CMS tasks are "empty" units at the moment. Despite this, they are still being distributed and are assigned 4 (or more) CPUs on your BOINC host PC and run from anything between a couple of minutes to about half an hour. It's been like that for some weeks! The LHC "system" is being reconfigured - an extensive operation. It is speculated that these "void-of-data" workunits are providing some sort of feedback to the system engineers and are useful in that way. There's just no actual number crunching as such.
See this thread for the latest information about the progress being made.


I think they caused a BSOD (black not blue) on my system. Everything was running fine and then it crashed.
Another project on BOINC is also re-configuring their IT and before they went dark their last tasks also seem to crash my system.
Really odd.
ID: 51337 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1193
Credit: 59,391,484
RAC: 71,960
Message 51338 - Posted: 28 Dec 2024, 10:24:48 UTC

Please stop running these CMS tasks since it is just a waste of your time and there is no problem on your end.
We are just waiting for new work to be ready at the server and the update will most likely come from Ivan when that happens and most likely in January

We have this same thing every December
Volunteer Mad Scientist For Life
ID: 51338 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 105
Credit: 26,099,112
RAC: 781
Message 51339 - Posted: 28 Dec 2024, 20:12:52 UTC - in response to Message 51338.  

Please stop running these CMS tasks since it is just a waste of your time and there is no problem on your end.
We are just waiting for new work to be ready at the server and the update will most likely come from Ivan when that happens and most likely in January

We have this same thing every December


That's true, but they should still be completing without massive numbers of errors. The fact that there's no actual work for them to do is a separate issue to the errors.
ID: 51339 · Report as offensive     Reply Quote
Sagittarius Lupus
Avatar

Send message
Joined: 19 May 10
Posts: 7
Credit: 4,263,765
RAC: 148
Message 51343 - Posted: 31 Dec 2024, 6:59:21 UTC

If this is a known issue, it could probably (this is a politism; it could definitely) be better communicated; e.g., as a notice to the BOINC client. I personally spent some time attempting to debug my CVMFS/Squid implementation only to find that nothing at all was wrong with it. No big deal; good to do some spring (winter) cleaning and re-familiarize myself with the machinery from time to time; but less intrepid volunteers will be stymied until they get the lightbulb to look here.

There is no reason for statements like "We have this same thing every December." If volunteers seem not to know what is going on, then the institution has under-communicated.
ID: 51343 · Report as offensive     Reply Quote

Message boards : CMS Application : Why all CMS virtualbox task were all failed... host was damaged?


©2025 CERN