1) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45736)
Posted 22 Nov 2021 by skydivingnerd
Post:
It looks like the task completed 1 subtask and didn't get a 2nd subtask.
That's not a classical failure but it makes a task less efficient.

The reason might be the long delay of nearly 7 h shown here:

2021-11-21 10:32:41 (10380): VM state change detected. (old = 'running', new = 'paused')
2021-11-21 17:23:43 (10380): VM state change detected. (old = 'paused', new = 'running')

The task delay was my doing. I run BOINCTasks at home to keep track of my hosts and discovered that it can execute actions based on project or task critera. In order to not miss a CMS work-unit timing out and failing I created a BOINCTask event to suspend a CMS task after a few minutes. The next two CMS tasks the Win10 host is working on will not have that happen to them. I'd also removed the <project_max_concurrent> setting after my BOINC client work-unit fetch went haywire and downloaded several hundred tasks. I'll add that back into LHC@Home and limit CMS VBox to two concurrent work-units.
2) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45733)
Posted 22 Nov 2021 by skydivingnerd
Post:
I have a completed work-unit and more in progress!
https://lhcathome.cern.ch/lhcathome/result.php?resultid=334074794
3) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45730)
Posted 21 Nov 2021 by skydivingnerd
Post:
I went and looked that over and see what you've found. I believe that is when I updated VBox to 6.1.28. I did see the Glidin error a few days ago and began looking into that, wondering how to troubleshoot port connectivity from the VM perspective.

I also suspect this to be the reason why glidein fails.
The VM requires a couple of TCP ports to be open.
Beside the ports that are tested within bootstrap CMS requires 1094 and 8000.

I've had port 8000 open since I started the project, but TCP/1094 was removed as I was told it was no longer needed from the LHC@Home FAQ port list
https://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use

I've added both TCP/9094 and TCP/1094 back into my BOINC aliases list and now have a running CMS VBox task on my Win10 host. The task has been running for about an hour now and is using ~86% of a CPU. Much different from before where the task would end at <21 minutes and consume no more than 3-4% CPU.

Can the LHC@Home port listing be re-validated to ensure all used ports are on the FAQ?
4) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45727)
Posted 19 Nov 2021 by skydivingnerd
Post:
Maybe there is a lot of junk media left over from crashed VM's, since it looks like the hard disk was lost.

I experienced issues that left junk in the .\slots\ folder and it was one of the first troubleshooting steps I did when I started digging into this on my Win10 client. The VM crash persists through multiple VirtualBox versions, 6.1.12, 6.1.16, 6.1.28 and ensuring the BOINC data directory .\slots\ folders and the VBox manager are clean.

So far, I've not gotten far on the Virtual Box forums. There are some suggestions on the underlying configuration of the LHC@Home VM, but I'm not in anyway qualified to raise these to the project. As far as I can search on the LHC@Home forums, I can't find anything like what I'm experiencing. If it goes on much longer with no appreciable suggestions on troubleshooting on the contents of the log files, I may have to just remove the project from my Win10 host. I don't want to do that. But I won't have a choice if I can't troubleshoot the issue and just keep sending failing work-units back.

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10687301&offset=0&show_names=0&state=6&appid=11
5) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45705)
Posted 15 Nov 2021 by skydivingnerd
Post:
I made a guestimate and was waiting on my host when it received a CMS task. I've captured the VBox, VBoxHardening, VBoxUI, and vbox_trace logs from the ./slots/ folder the task was running in. I'm posting to the VirtualBox forums for help in troubleshooting the logs.

https://forums.virtualbox.org/viewtopic.php?f=3&t=104465

If anyone knows how to review and troubleshoot VBox logs, I've uploaded them to a workdrive space
https://workdrive.zoho.com/folder/pgoec33ffeb6d9e36461c9a953c076976b93c

R/S
Scott
6) Questions and Answers : Windows : Vbox logs from failed LHC tasks - capturing vbox.log and vboxhardening.log (Message 45704)
Posted 15 Nov 2021 by skydivingnerd
Post:
I made a guestimate and was waiting on my host when it received a CMS task. I've captured the VBox, VBoxHardening, VBoxUI, and vbox_trace logs from the ./slots/ folder the task was running in. I'm posting to the VirtualBox forums for help in troubleshooting the logs.

https://forums.virtualbox.org/viewtopic.php?f=3&t=104465
7) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45688)
Posted 13 Nov 2021 by skydivingnerd
Post:
So removing Anti-virus did not solve the issue. This task just failed at 1349 EST.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=332726303
8) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45679)
Posted 13 Nov 2021 by skydivingnerd
Post:
That is from your CMS-task with no success:
2021-11-11 12:57:41 (15744): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))
That is from a successful CMS-Task:
2021-11-06 07:18:56 (12680): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)

I'm not understanding how that can help. The heartbeat time frame is a variable of the project/task. It's not a user configurable setting that would need to be "corrected" to ensure tasks complete.
9) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45675)
Posted 13 Nov 2021 by skydivingnerd
Post:
That's just it, no I do not see it. I'm going down the path of troubleshooting the vbox.log and vboxhardening.log of the individual VMs when they are running. I've also uninstalled my AntiVirus, rebooted, and ensured its cleared out. The last CMS task my host got yesturday was during a time when Rosetta@home was not sending tasks and my host only had a couple of them running when the CMS task started.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=332568799

The CMS task stderr log shows that it there was 26374 MB of memory available when it started.
Memory size: 32673 MByte
Memory available: 26374 MByte

This task still failed in the same way as the past several days despite a much lower CPU and memory load.

Note: I have a companion thread to this one on vbox.log and vboxhardening.log capture.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5754
10) Questions and Answers : Windows : Vbox logs from failed LHC tasks - capturing vbox.log and vboxhardening.log (Message 45674)
Posted 13 Nov 2021 by skydivingnerd
Post:
I thought I'd create a new thread here as I'm not finding any related posts on the LHC boards.
In attempting to troubleshoot the issues my Win10 host is having in thread https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5751, I'm looking more into how to troubleshoot vbox guest instances and their interaction with host machines. The individual VM vbox.log and vboxhardening.log are created inside the "<BOINC directory>/slots/n/..." and removed/deleted when a task fails.

I know that part of the VBoxSVC.log is copied into the LHC task stderr output under the "Hypervisor System Log:" and I believe a part of the individual vm vbox.log is under the "VM Trace Log:" section. (I'm not sure of the VM TRace log info since I have not looked into the vbox.log of a running CMS task.) It does not look like any of the vboxhardening.log info is captured in the task stderr output. For my host issue above, I need to get the whole log files for troubleshooting with VirtualBox forums.

The only option right now is to hover over my client, waiting for a CMS task to be assigned, and watching the vm log files in the ~20 minutes it's running.

- Is there a way to capture/copy the whole log file to another directory when a task fails?
- Can this functionality be added to the LHC features or capabilities backlog for development?
- Could it be a debug option enabled through one of the BOINC client config files? (no idea if that is feasible or not)
11) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45669)
Posted 12 Nov 2021 by skydivingnerd
Post:
Thanks. I've set Rosetta to not fetch new tasks so only LHC CMS tasks will run. I'll see if that does it.
If it does, is this an issue of needing more system memory so everything can run in memory without drive swapping? Or is this a limitation on floating point calculations of the processor?

R/S
Scott
12) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45665)
Posted 11 Nov 2021 by skydivingnerd
Post:
I just reset the project and will know what that does tomorrow afternoon.

On the permissions front, my user account runs the BOINC process and and Full Control rights to the C:\ProgramData\BOINC and C:\ProgramData\BOINC\slots directories. I can see my user account owns the processes for BOINC Mgr and the running tasks via Process Explorer.

My host has a Samsung 860 EVO 1TB SSD and resource monitor shows disk I/O from 1-4% with BOINC running 16 threads of Rosetta@home tasks. In Use memory is at ~17GB out of 32GB in the system. When I was attempting to run more than two CMS tasks a while back, I would get notices in BOINC that a CMS task was paused, waiting for memory. That said, I've had 14 Rosetta and 2 CMS tasks running concurrently on this host. The past few days LHC@home has only assigned a single CMS task to this host due to the number of failures.

R/S
Scott
13) Questions and Answers : Windows : vBox could not find machine - ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (Message 45663)
Posted 11 Nov 2021 by skydivingnerd
Post:
My Win10 host is running BOINC 7.16.20 with vBox 6.1.16 plus the 6.1.16 Extension pack. I've been encountering this issue where my CMS tasks are failing because vBox is unable to find the VM to run the task. I've uninstalled and reinstalled BOINC and vBox several times with the issue persisting. I'm unsure how to continue troubleshooting this.

Most recent failed task
https://lhcathome.cern.ch/lhcathome/result.php?resultid=332187349

Tasks page for my host
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10687301&offset=0&show_names=0&state=6&appid=11

I can see in the BOINC logs that the VM is started in a project slot
2021-11-10 14:25:35 (15216): forwarding host port 52026 to guest port 80
2021-11-10 14:25:35 (15216): Enabling remote desktop for VM.
2021-11-10 14:25:36 (15216): Enabling shared directory for VM.
2021-11-10 14:25:36 (15216): Starting VM using VBoxManage interface. (boinc_62fd0aaede4ddcf6, slot#16)
2021-11-10 14:25:41 (15216): Successfully started VM. (PID = '2380')
2021-11-10 14:25:41 (15216): Reporting VM Process ID to BOINC.
2021-11-10 14:25:41 (15216): Guest Log: BIOS: VirtualBox 6.1.16
2021-11-10 14:25:41 (15216): Guest Log: CPUID EDX: 0x178bfbff
2021-11-10 14:25:41 (15216): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2021-11-10 14:25:41 (15216): VM state change detected. (old = 'poweredoff', new = 'running')


But then a short time later, in the vBox hypervisor logs show that the registered VM cannot be found.
17:35:01.292412          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={d0a0163f-e254-4e5b-a1f2-011cf991c38d} aComponent={VirtualBoxWrap} aText={Could not find a registered machine named 'boinc_62fd0aaede4ddcf6'}, preserve=false aResultDetail=0


The VM trace logs show that it takes about 20 minutes for it to error out and initiate a shutdown of the VM even though it logs it starting.
2021-11-10 14:25:36 (15216): 
Command: VBoxManage -q sharedfolder add "boinc_62fd0aaede4ddcf6" --name "shared" --hostpath "C:\ProgramData\BOINC\slots\16/shared"
Exit Code: 0
Output:

2021-11-10 14:25:40 (15216): 
Command: VBoxManage -q startvm "boinc_62fd0aaede4ddcf6" --type headless
Exit Code: 0
Output:
Waiting for VM "boinc_62fd0aaede4ddcf6" to power on...
VM "boinc_62fd0aaede4ddcf6" has been successfully started.

2021-11-10 14:25:42 (15216): 
Command: VBoxManage -q controlvm "boinc_62fd0aaede4ddcf6" cpuexecutioncap 100 
Exit Code: 0
Output:

2021-11-10 14:45:43 (15216): 
Command: VBoxManage -q controlvm "boinc_62fd0aaede4ddcf6" poweroff
Exit Code: 0
Output:
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

2021-11-10 14:45:43 (15216): 
Command: VBoxManage -q snapshot "boinc_62fd0aaede4ddcf6" list 
Exit Code: -108
Output:
This machine does not have any snapshots

2021-11-10 14:45:44 (15216): 
Command: VBoxManage -q bandwidthctl "boinc_62fd0aaede4ddcf6" remove "boinc_62fd0aaede4ddcf6_net" 
Exit Code: 0
Output:


What can be looked at next to nail this error down?

R/S

Scott
14) Message boards : CMS Application : getting 'Error while computing' for CMS tasks (Message 45629)
Posted 7 Nov 2021 by skydivingnerd
Post:
I removed vBox 6.1.12, rebooted, reinstalled 6.1.12 and the issue persists. Since the vBox logs still show notices for a version difference from 6.1.12 and 6.1.16, I uninstalled 6.1.12 and installed 6.1.16. Rebooting and ensuring all the system files were gone.

I do not have any extension packs configured for use. Do I need to configure that? I've previously read that the vBox extension pack was not needed. Is that incorrect?

I'm waiting for a few tasks to download to see if the issue persists in the vBox logs. If so, I'll open a new thread under the Windows board.
15) Message boards : CMS Application : getting 'Error while computing' for CMS tasks (Message 45625)
Posted 6 Nov 2021 by skydivingnerd
Post:
The main BOINC directory is at
C:\Program Files\BOINC
with the data being at
C:\ProgramData\BOINC


Virtual box is at
C:\Users\Scott\.VirtualBox


Looking into the vBox logs, I do see that it's pointing out the differing versions and lack of permissions to the vm storage. Could it not have liked the software downgrade? I'll download the current versions again and reinstall them.

03:59:21.738008          Saving settings file "C:\ProgramData\BOINC\slots\9\boinc_00ca13701b3d268d\boinc_00ca13701b3d268d.vbox" with version "1.16-windows"
03:59:21.886440          ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
03:59:21.926647          Saving settings file "C:\ProgramData\BOINC\slots\9\boinc_00ca13701b3d268d\boinc_00ca13701b3d268d.vbox" with version "1.16-windows"
03:59:22.180939          Saving settings file "C:\Users\Scott\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
03:59:28.134972          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={d0a0163f-e254-4e5b-a1f2-011cf991c38d} aComponent={VirtualBoxWrap} aText={Could not find a registered machine named 'boinc_c84b3e5bbd6d461e'}, preserve=false aResultDetail=0
03:59:28.389054          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:28.725563          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:28.893063          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:29.647620          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:30.054998          Saving settings file "C:\Users\Scott\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
03:59:30.161742          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:30.172140          Saving settings file "C:\Users\Scott\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
03:59:30.418205          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:30.703352          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:30.962257          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:31.220827          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:31.493533          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:31.769940          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:32.026155          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:32.284644          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:32.544079          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:32.800604          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:33.059791          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:33.316373          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:33.577000          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:33.833723          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:34.091140          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:34.091339          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:34.995533          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:35.100846          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:35.101101          Saving settings file "C:\Users\Scott\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
03:59:35.117080          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:35.349780          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 1 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:35.350066          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 1 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:35.353458          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:35.605351          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:35.866400          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:36.375597          Saving settings file "C:\ProgramData\BOINC\slots\8\boinc_c84b3e5bbd6d461e\boinc_c84b3e5bbd6d461e.vbox" with version "1.16-windows"
03:59:36.630528          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:36.630634          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:36.631122          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 1 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:36.631188          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 1 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
03:59:36.635096          ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
03:59:36.886577          Launched VM: 79687136 pid: 10988 (0x2aec) frontend: headless name: boinc_c84b3e5bbd6d461e
03:59:37.339774          ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={d0a0163f-e254-4e5b-a1f2-011cf991c38d} aComponent={VirtualBoxWrap} aText={Could not find a registered machine named 'boinc_0665d7a71cb3cdc3'}, preserve=false aResultDetail=0
03:59:37.591751          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:37.929053          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
03:59:38.100938          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
16) Message boards : CMS Application : getting 'Error while computing' for CMS tasks (Message 45624)
Posted 6 Nov 2021 by skydivingnerd
Post:
I did downgrade the BOINC and vBox versions while attempting to troubleshoot the runaway task downloads I was encountering. I found, through the Rosetta@home forum, that BOINC has issues with calculating task queue depth with the <max_concurrent> or <project_max_concurrent> flags in project app_config files. I thought it could have been the upgrade of BOINC and vBox I did a while back, so I downgraded them. I've also removed those settings from my Win10 host app_config.

I have rebooted since then and just now had my client upload more failed results to LHC@home. This task is one of the ones that failed just in the last 15-20 minutes.
https://lhcathome.cern.ch/lhcathome/result.php?result_name=CMS_3108593_1635827713.939210_0
I've also restricted the project from getting new tasks for the time being until I can get this fixed.

R/S
Scott
17) Message boards : CMS Application : getting 'Error while computing' for CMS tasks (Message 45619)
Posted 6 Nov 2021 by skydivingnerd
Post:
I'm getting a lot of computation errors for CMS vBox64 tasks on my Win10 host.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10687301&offset=0&show_names=0&state=6&appid=

I'm not sure where to start in troubleshooting this. My other Linux based hosts are doing ok (aside from sporadic comms issues with my FW and SNORT).
I'm running BOINC client 7.16.11 with vBox 6.1.12. Below is the app_config file from my Win10 host and the output of one of the failed tasks below that.

<app_config>
<!--
  <app>
    <name>ATLAS</name>
    <fraction_done_exact/>
  </app>
  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <avg_ncpus>4.0</avg_ncpus>
    <cmdline>>--nthreads 4 --memory_size_mb 3800</cmdline>
  </app_version>

  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <avg_ncpus>8.0</avg_ncpus>
    <cmdline>>--nthreads 8 --memory_size_mb 5000</cmdline>
  </app_version>
-->
<!--
  <app>
    <name>Theory</name>
    <fraction_done_exact/>
  </app>
  <app_version>
    <app_name>Theory</app_name>
    <plan_class>vbox64_theory</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <cmdline>--nthreads 1</cmdline>
  </app_version>
-->
  <app>
    <name>CMS</name>
    <fraction_done_exact/>
  </app>
  <app_version>
    <app_name>CMS</app_name>
    <plan_class>vbox64</plan_class>
    <avg_ncpus>1.0</avg_ncpus>
    <cmdline>--nthreads 1 --memory_size_mb 2048</cmdline>
  </app_version>

</app_config>


Here is the task output from one of the failed tasks from my Win10 host.
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified.
 (0xd0) - exit code 208 (0xd0)</message>
<stderr_txt>
2021-11-06 11:31:28 (16296): Detected: vboxwrapper 26202
2021-11-06 11:31:28 (16296): Detected: BOINC client v7.16.11
2021-11-06 11:31:28 (16296): Detected: VirtualBox VboxManage Interface (Version: 6.1.12)
2021-11-06 11:31:29 (16296): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2021-11-06 11:31:29 (16296): Successfully copied 'init_data.xml' to the shared directory.
2021-11-06 11:31:30 (16296): Create VM. (boinc_05d65e3058a799c6, slot#9)
2021-11-06 11:31:31 (16296): Setting Memory Size for VM. (2048MB)
2021-11-06 11:31:31 (16296): Setting CPU Count for VM. (1)
2021-11-06 11:31:31 (16296): Setting Chipset Options for VM.
2021-11-06 11:31:31 (16296): Setting Boot Options for VM.
2021-11-06 11:31:32 (16296): Setting Network Configuration for NAT.
2021-11-06 11:31:32 (16296): Enabling VM Network Access.
2021-11-06 11:31:32 (16296): Disabling USB Support for VM.
2021-11-06 11:31:32 (16296): Disabling COM Port Support for VM.
2021-11-06 11:31:33 (16296): Disabling LPT Port Support for VM.
2021-11-06 11:31:33 (16296): Disabling Audio Support for VM.
2021-11-06 11:31:33 (16296): Disabling Clipboard Support for VM.
2021-11-06 11:31:33 (16296): Disabling Drag and Drop Support for VM.
2021-11-06 11:31:34 (16296): Adding storage controller(s) to VM.
2021-11-06 11:31:34 (16296): Adding virtual disk drive to VM. (vm_image.vdi)
2021-11-06 11:31:34 (16296): Adding VirtualBox Guest Additions to VM.
2021-11-06 11:31:34 (16296): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2021-11-06 11:31:35 (16296): forwarding host port 49866 to guest port 80
2021-11-06 11:31:35 (16296): Enabling remote desktop for VM.
2021-11-06 11:31:35 (16296): Required extension pack not installed, remote desktop not enabled.
2021-11-06 11:31:35 (16296): Enabling shared directory for VM.
2021-11-06 11:31:36 (16296): Starting VM using VBoxManage interface. (boinc_05d65e3058a799c6, slot#9)
2021-11-06 11:31:39 (16296): Successfully started VM. (PID = '15392')
2021-11-06 11:31:39 (16296): Reporting VM Process ID to BOINC.
2021-11-06 11:31:39 (16296): Guest Log: BIOS: VirtualBox 6.1.12
2021-11-06 11:31:39 (16296): Guest Log: CPUID EDX: 0x178bfbff
2021-11-06 11:31:39 (16296): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2021-11-06 11:31:39 (16296): VM state change detected. (old = 'poweredoff', new = 'running')
2021-11-06 11:31:39 (16296): Detected: Web Application Enabled (http://localhost:49866)
2021-11-06 11:31:39 (16296): Preference change detected
2021-11-06 11:31:39 (16296): Setting CPU throttle for VM. (100%)
2021-11-06 11:31:40 (16296): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))
2021-11-06 11:31:41 (16296): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2021-11-06 11:31:41 (16296): Guest Log: BIOS: Booting from Hard Disk...
2021-11-06 11:31:43 (16296): Guest Log: BIOS: KBD: unsupported int 16h function 03
2021-11-06 11:31:43 (16296): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2021-11-06 11:32:03 (16296): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2021-11-06 11:32:03 (16296): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2021-11-06 11:32:04 (16296): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000085 main     Log opened 2021-11-06T15:32:03.999422000Z
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000168 main     OS Product: Linux
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000193 main     OS Release: 4.14.232-19.cernvm.x86_64
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000215 main     OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000247 main     Executable: /usr/sbin/VBoxService
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000247 main     Process ID: 2153
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.000248 main     Package type: LINUX_64BITS_GENERIC
2021-11-06 11:32:04 (16296): Guest Log: 00:00:00.001532 main     5.2.6 r120293 started. Verbose level = 0
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Mounting the shared directory
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Testing connection to cern.ch
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Testing connection to VCCS
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Testing connection to HTCondor
2021-11-06 11:32:13 (16296): Guest Log: [INFO] Testing connection to WMAgent
2021-11-06 11:32:14 (16296): Guest Log: [INFO] Testing connection to Frontier
2021-11-06 11:32:14 (16296): Guest Log: [INFO] Got a proxy from the local BOINC client
2021-11-06 11:32:14 (16296): Guest Log: [INFO] Will use it for CVMFS and Frontier
2021-11-06 11:32:14 (16296): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2021-11-06 11:32:18 (16296): Guest Log: [INFO] Probing /cvmfs/cvmfs-config.cern.ch... OK
2021-11-06 11:32:18 (16296): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2021-11-06 11:32:20 (16296): Guest Log: [INFO] 2.7.2.0 http://s1bnl-cvmfs.openhtc.io http://192.168.150.1:3128
2021-11-06 11:32:20 (16296): Guest Log: [INFO] Reading volunteer information
2021-11-06 11:32:21 (16296): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2021-11-06 11:32:22 (16296): Guest Log: [INFO] CMS application starting. Check log files.
2021-11-06 11:52:22 (16296): Guest Log: [ERROR] glidein exited with return value 1.
2021-11-06 11:52:22 (16296): Guest Log: [DEBUG] Volunteer: scotth (787857)
2021-11-06 11:52:22 (16296): Guest Log: [INFO] Shutting Down.
2021-11-06 11:52:52 (16296): VM Completion File Detected.
2021-11-06 11:52:52 (16296): VM Completion Message: glidein exited with return value 1.
.
2021-11-06 11:52:52 (16296): Powering off VM.
2021-11-06 11:52:52 (16296): Successfully stopped VM.
2021-11-06 11:52:52 (16296): Deregistering VM. (boinc_05d65e3058a799c6, slot#9)
2021-11-06 11:52:52 (16296): Removing network bandwidth throttle group from VM.
2021-11-06 11:52:53 (16296): Removing VM from VirtualBox.
11:52:58 (16296): called boinc_finish(208)

</stderr_txt>
]]>
18) Questions and Answers : Wish list : Set max. WU per day to 1 for hosts who do not deliver valid results. (Message 45614)
Posted 5 Nov 2021 by skydivingnerd
Post:
All,

My bad. It's been a while since I've been monitoring my machines. Other life issues has taken priority. I've detached that client from LHC for the time being until I can devote time to rebuilding it.

R/S
Scott H
19) Questions and Answers : Windows : Windows vbox64 CMS Simulation tasks failing - VM unable to validate X509 credential from LHC@home (Message 44945)
Posted 13 May 2021 by skydivingnerd
Post:
Great news on identifying the firewall port page needs updating.

I now have three completed CMS Simulation tasks for my Win10 host!
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316425963
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316423089
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316428800

The only modification I've made since your previous post was in adding port 3126 to my
rule allowing it out. I saw that in the error log of one of my failed work units when you
quoted it back in my post.

I have not made any additional changes on that. Was the CMS Simulation VM updated?


Additionally, from your info on the port usage

Not in use:
Port 3127
Port 3125 (replaced by port 3126 and used by fallback proxies)
Port 5222 (XMPP)
Port 9094
Port 1094

LHCb (all DIRAC ports)

I'll remove those from my allowed outbound ports for the LHC@Home traffic.


Speaking to your comment here:

As a result the packets to the required destination ports should not be restricted
to a few IPs, they should be allowed for all destination IPs.

I've always had my FW rule configured to allow the identified ports out to any
IP. I specifically added the CVMFS IPs I found to my Snort PASS list to ensure
any of them did not get blocked by a signature hit.

Now that the host is working, yes, I will be looking up the Squid configuration
and setting it up in pfSense to get my clients from reaching all the way out.

Thank you!

R/S
Scott
20) Questions and Answers : Windows : Windows vbox64 CMS Simulation tasks failing - VM unable to validate X509 credential from LHC@home (Message 44943)
Posted 13 May 2021 by skydivingnerd
Post:

This will not work as the certs need to be installed inside the VM.
To may test Theory vbox on that computer to see whether it behaves different.

I didn't think it really would... but gave it a shot anyway just to be sure for myself.


It looks as if you either use a local proxy that is not correctly configured.
=> CVMFS configures a fallback proxy.

I don't have a local Squid proxy configured on, or for, my hosts. All my other hosts (except one, which will be getting an OS
rebuild soon) running native work units are reaching out for their images. My ISP connection handles the traffic easily. I'm
just working on getting all my hosts running correctly, then will be configuring Squid proxy on my firewall and then making config
changes on each host. Then working out any issues on that...


Or the CVMFS inside the VM can only partly access CERN's CVMFS.
Since some CA certs are taken from there your cert issues are follow up issues.

The latter mostly point out an incomplete local firewall setup.

Below is one of the many links I found when I was initially setting up LHC@Home and getting native work units to run correctly.
I've configured a port alias in pfSense to handle it all, with the exception of my existing rules for port 80 and 443. The FW rule
allowing all the traffic is configured for TCP only vice TCP/UDP.

https://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use

Here is my port list:
3125		Common - CVMFS
8000		ATLAS - HTTP
8080		ATLAS - HTTP
23128		ATLAS - HTTP
3127:3128	ATLAS - HTTP Proxy
5222		ATLAS - XMPP
9094		ATLAS - TCP
9618		Theory, CMS, LHCb - Condor
4080		CMS - WMAgent
8080		CMS - Frontier
8443		LHCb - DIRAC
9133:9149	LHCb - DIRAC
9166		LHCb - DIRAC
9196:9199	LHCb - DIRAC


I've also been chewing through my Snort logs the past several weeks, identifying and suppressing signature alerts for
LHC@Home traffic. I've got a nice list of IP addresses LHC@home communicates with. A few of what I believe are the
more critical CVMFS IP addresses I've added to an "External Server" alias list and configured that on the Snort Pass list
to prevent any alerting on those.
Here are the CVMFS entries I have in the alias:
104.21.88.130	  LHC@Home - s1f'nal/bnl/unl/cern/ral'-cvmfs.openhtc.io
172.67.179.99	  LHC@Home - s1f'nal/bnl/unl/cern/ral'-cvmfs.openhtc.io
158.39.48.38	  LHC@Home - atlas-db-squid1.grid.uiocloud.no

I'm still stuck on the response I saw in the packet capture from the LHC@Home CMS Simulation VM. It actively rejected the
server side Certificate Authority as invalid. I still believe this is a LHC server side issue unless someone can validate that I'm
the only one with this issue.

R/S
Scott


Next 20


©2024 CERN