Message boards : CMS Application : All tasks failing
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 48266 - Posted: 29 Jun 2023, 10:44:26 UTC

Just noticed this today, it seems all my received tasks are instantly failing with computation error?
They all seem to be crapping out with the same error

Example
https://lhcathome.cern.ch/lhcathome/result.php?resultid=395751485

I have tried clearing any VM's from VBox and downloading fresh units and also updating VBox, neither worked.

Any ideas?
ID: 48266 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,428,032
RAC: 130,040
Message 48267 - Posted: 29 Jun 2023, 11:16:36 UTC - in response to Message 48266.  

This is a result of a race condition and needs to be cleaned manually.
See:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5982&postid=47976
ID: 48267 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 77
Credit: 6,120,681
RAC: 17,196
Message 48268 - Posted: 29 Jun 2023, 13:18:13 UTC - in response to Message 48267.  

I think that's sorted it, thanks
ID: 48268 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48647 - Posted: 23 Sep 2023, 19:40:54 UTC

CMS tasks are back since a few hours ago, but they are all failing after about 20 minutes.

Stderr starts with:

<core_client_version>7.22.2</core_client_version>
<![CDATA[
<message>
Die Platzhalterzeichen f�r Dateinamen (* oder ?) wurden falsch eingegeben, oder es wurden zu viele Platzhalterzeichen angegeben.
(0xd0) - exit code 208 (0xd0)</message>

example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=399619068

too bad that I took some time to change many of my hosts to crunch CMS before I detected this problem :-(
ID: 48647 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,428,032
RAC: 130,040
Message 48648 - Posted: 23 Sep 2023, 20:04:19 UTC - in response to Message 48647.  

The relevant logfile entry is this:
2023-09-23 21:25:00 (19160): Guest Log: [ERROR] glidein exited with return value 1.

It points out an error at a deeper level.
Can't be solved at BOINC level.

CERN's Grafana shows no valid task from today.
This either means Grafana can't contact the relevant backend systems or all tasks from the current CMS batch failed.
The latter seems to be more likely.
ID: 48648 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48649 - Posted: 23 Sep 2023, 20:22:27 UTC - in response to Message 48648.  

... or all tasks from the current CMS batch failed.
The latter seems to be more likely.
my suspicion right away was that the tasks are misconfigured :-(

Would of course be great if someone could withdraw the batch from the download queue. However, with the weekend now this will probably not happen before next Monday, at the earliest.
ID: 48649 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 33
Credit: 1,161,776
RAC: 4,966
Message 48650 - Posted: 24 Sep 2023, 8:23:10 UTC - in response to Message 48649.  
Last modified: 24 Sep 2023, 8:25:25 UTC

bonjour
moi c'est pareil sur mes 2 pc
lhc@home focntionnait bien puis toutes les calculs cms partent en erreur de puis 2 jours environ.
J'ai mis a jour virtualbox mais cela ne sert a rien .J ai encore des erreurs.
hello
me it’s the same on my 2 pc
lhc@home worked well then all cms calculations go in error of then about 2 days.
I have updated virtualbox but it is useless . I still have errors.
ID: 48650 · Report as offensive     Reply Quote
FanzaFede
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 19 Jul 18
Posts: 3
Credit: 128,970
RAC: 11
Message 48651 - Posted: 24 Sep 2023, 9:40:56 UTC

Hi Erich56 and all,
thanks for your feedback.
We have submitted yesterday two new workflows for CMS@home, configured as usual, but I see no volunteers have been connected to the batch pool of CMS. The It might be is a problem at the level of the "health" check of the system, executed before allowing the VM connection to the CMS pool.
It seems a problem with the required token, but we have to investigate.
Please leave your machines connected to CMS@home.
We let you know as soon as the problem is discovered (and hopefully resolved)

Thanks,
cheers
Federica for CMS@home support
ID: 48651 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48652 - Posted: 24 Sep 2023, 10:50:25 UTC - in response to Message 48651.  

Federica, many thanks for your feedback :-)

Hopefully CMS will work well soon !
ID: 48652 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 33
Credit: 1,161,776
RAC: 4,966
Message 48657 - Posted: 25 Sep 2023, 11:16:44 UTC - in response to Message 48652.  
Last modified: 25 Sep 2023, 11:17:49 UTC

probleme toujours présent .J ai désinstaller boinc 7.24.1 pour remettre boinc 7.22.2 mais cela n'a rien changé.

problem always present . I uninstall boinc 7.24.1 to put boinc 7.22.2 back but it did not change anything


CMS_701686_1695630839.736948
applications CMS Simulation
créé 25 Sep 2023, 8:33:59 UTC
quorum minimum 1
réplication initiale 1
nombre maximum de tâches en erreur/totales/succès 1, 1, 1
erreurs Trop de résultats totaux

<core_client_version>7.22.2</core_client_version>
<![CDATA[
<message>
- exit code 194 (0xc2)</message>
<stderr_txt>
2023-09-25 11:19:03 (11112): Detected: vboxwrapper 26206
2023-09-25 11:19:03 (11112): Detected: BOINC client v7.22.2
2023-09-25 11:19:04 (11112): Detected: VirtualBox VboxManage Interface (Version: 7.0.10)
2023-09-25 11:19:05 (11112): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2023-09-25 11:19:05 (11112): Successfully copied 'init_data.xml' to the shared directory.
2023-09-25 11:19:05 (11112): Create VM. (boinc_fe513e83616ebdd4, slot#13)
2023-09-25 11:19:06 (11112): Setting Memory Size for VM. (2048MB)
2023-09-25 11:19:06 (11112): Setting CPU Count for VM. (1)
2023-09-25 11:19:06 (11112): Setting Chipset Options for VM.
2023-09-25 11:19:07 (11112): Setting Graphics Controller Options for VM.
2023-09-25 11:19:07 (11112): Setting Boot Options for VM.
2023-09-25 11:19:07 (11112): Setting Network Configuration for NAT.
2023-09-25 11:19:08 (11112): Enabling VM Network Access.
2023-09-25 11:19:08 (11112): Disabling USB Support for VM.
2023-09-25 11:19:08 (11112): Disabling COM Port Support for VM.
2023-09-25 11:19:09 (11112): Disabling LPT Port Support for VM.
2023-09-25 11:19:09 (11112): Disabling Audio Support for VM.
2023-09-25 11:19:09 (11112): Disabling Clipboard Support for VM.
2023-09-25 11:19:10 (11112): Disabling Drag and Drop Support for VM.
2023-09-25 11:19:10 (11112): Adding storage controller(s) to VM.
2023-09-25 11:19:10 (11112): Adding virtual disk drive to VM. (CMS_2022_09_07_prod.vdi)
2023-09-25 11:19:13 (11112): Adding VirtualBox Guest Additions to VM.
2023-09-25 11:19:14 (11112): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2023-09-25 11:19:14 (11112): forwarding host port 50338 to guest port 80
2023-09-25 11:19:14 (11112): Enabling remote desktop for VM.
2023-09-25 11:19:15 (11112): Enabling shared directory for VM.
2023-09-25 11:19:16 (11112): Starting VM using VBoxManage interface. (boinc_fe513e83616ebdd4, slot#13)
2023-09-25 11:19:25 (11112): Successfully started VM. (PID = '4184')
2023-09-25 11:19:25 (11112): Reporting VM Process ID to BOINC.
2023-09-25 11:19:25 (11112): Guest Log: BIOS: VirtualBox 7.0.10
2023-09-25 11:19:25 (11112): Guest Log: CPUID EDX: 0x178bfbff
2023-09-25 11:19:25 (11112): Guest Log: BIOS: No PCI IDE controller, not probing IDE
2023-09-25 11:19:25 (11112): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors
2023-09-25 11:19:25 (11112): VM state change detected. (old = 'poweredoff', new = 'running')
2023-09-25 11:19:25 (11112): Detected: Web Application Enabled (http://localhost:50338)
2023-09-25 11:19:25 (11112): Detected: Remote Desktop Enabled (localhost:50351)
2023-09-25 11:19:25 (11112): Preference change detected
2023-09-25 11:19:25 (11112): Setting CPU throttle for VM. (80%)
2023-09-25 11:19:25 (11112): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2023-09-25 11:19:27 (11112): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2023-09-25 11:19:27 (11112): Guest Log: BIOS: Booting from Hard Disk...
2023-09-25 11:19:29 (11112): Guest Log: BIOS: KBD: unsupported int 16h function 03
2023-09-25 11:19:29 (11112): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2023-09-25 11:20:26 (11112): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2023-09-25 11:20:26 (11112): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2023-09-25 11:20:26 (11112): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000183 main Log opened 2023-09-25T09:20:27.941952000Z
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000305 main OS Product: Linux
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000344 main OS Release: 4.14.232-19.cernvm.x86_64
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000376 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000405 main Executable: /usr/sbin/VBoxService
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000405 main Process ID: 2145
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.000406 main Package type: LINUX_64BITS_GENERIC
2023-09-25 11:20:26 (11112): Guest Log: 00:00:00.005618 main 5.2.6 r120293 started. Verbose level = 0
2023-09-25 11:33:15 (11112): Guest Log: [INFO] Mounting the shared directory
2023-09-25 11:39:05 (11112): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2023-09-25 11:44:44 (11112): Guest Log: [INFO] Sourcing essential functions from /cvmfs/grid.cern.ch
2023-09-25 11:49:28 (11112): Guest Log: [INFO] Testing connection to cern.ch
2023-09-25 11:54:12 (11112): Guest Log: [INFO] Testing connection to VCCS
2023-09-25 11:59:02 (11112): Guest Log: [INFO] Testing connection to HTCondor
2023-09-25 12:03:34 (11112): Guest Log: [INFO] Testing connection to WMAgent
2023-09-25 12:09:41 (11112): Guest Log: [INFO] Testing connection to EOSCMS
2023-09-25 12:15:39 (11112): Guest Log: [INFO] Testing connection to CMS-Factory
2023-09-25 12:19:36 (11112): VM Heartbeat file specified, but missing heartbeat.
2023-09-25 12:19:36 (11112): Powering off VM.
2023-09-25 12:19:37 (11112): Successfully stopped VM.
2023-09-25 12:19:37 (11112): Deregistering VM. (boinc_fe513e83616ebdd4, slot#13)
2023-09-25 12:19:37 (11112): Removing network bandwidth throttle group from VM.
2023-09-25 12:19:37 (11112): Removing VM from VirtualBox.

Hypervisor System Log:

00:00:10.498844 Saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
00:00:10.503666 Finished saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml"
00:00:10.503929 Saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
00:00:10.507338 Finished saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml"
00:00:10.507774 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:10.511811 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:10.825757 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:10.830868 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:11.180720 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:11.184148 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:11.912506 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:11.916663 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:12.175407 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:12.180259 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:12.479745 ERROR [COM]: aRC=VBOX_E_NOT_SUPPORTED (0x80bb0009) aIID={300763af-5d6b-46e6-aa96-273eac15538a} aComponent={SessionMachine} aText={This VM is not encrypted}, preserve=false aResultDetail=0
00:00:12.485235 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={300763af-5d6b-46e6-aa96-273eac15538a} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 2 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
00:00:12.485278 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={300763af-5d6b-46e6-aa96-273eac15538a} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 2 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
00:00:12.704854 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={300763af-5d6b-46e6-aa96-273eac15538a} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
00:00:13.099339 Launched VM: 759480256 pid: 11328 (0x2c40) frontend: headless name: boinc_fe513e83616ebdd4
00:00:20.830954 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={Property 'CRYPT/KeyId' does not exist}, preserve=false aResultDetail=0
00:00:20.831460 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={ad47ad09-787b-44ab-b343-a082a3f2dfb1} aComponent={MediumWrap} aText={Property 'CRYPT/KeyId' does not exist}, preserve=false aResultDetail=0
00:00:21.902386 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
00:00:21.905483 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
00:00:30.167650 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:00:30.167909 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:00:30.381009 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 14
00:00:30.381205 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
00:05:53.006809 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:05:53.007013 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:05:53.220330 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 14
00:05:53.220538 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
00:25:12.054672 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:25:12.054806 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:25:12.267932 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
00:30:36.137321 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:30:36.137622 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:30:36.349897 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 14
00:30:36.350131 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
00:35:27.194809 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:35:27.194925 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:35:27.407884 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
00:43:49.309664 USBPROXY usbLibGetDevices: Starting USB device enumeration
00:43:49.309861 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 2
00:43:49.522987 USBPROXY Getting USB descriptor (id 0) failed (dwErr=31) on hub USB#ROOT_HUB30#4&a1cb6a2&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8} port 14
00:43:49.523302 USBPROXY usbLibGetDevices: Found 5 USB devices, 0 captured
01:00:33.290945 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
01:00:33.295296 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
01:00:33.564154 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={300763af-5d6b-46e6-aa96-273eac15538a} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
01:00:33.632109 Saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox" with version "1.19-windows"
01:00:33.638557 Finished saving settings file "C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox"
01:00:33.892625 Saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
01:00:33.898930 Finished saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml"
01:00:33.899048 Saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
01:00:33.901572 Finished saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml"
01:00:33.902413 DeleteVM Saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
01:00:33.906029 DeleteVM Finished saving settings file "C:\Users\pascal\.VirtualBox\VirtualBox.xml"
01:00:38.909665 main VirtualBox: object deletion starts
01:00:38.914081 main HostDnsMonitor: shutting down ...
01:00:38.914132 main HostDnsMonitor: shut down
01:00:38.916066 Watcher ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={7682d5eb-f00e-44f1-8ca2-99d08b1cd607} aComponent={VirtualBoxWrap} aText={The object is not ready}, preserve=false aResultDetail=0
01:00:38.916334 main VirtualBox: object deleted

VM Execution Log:


VM Startup Log:


VM Trace Log:

de, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 201 of file VBoxManageDisk.cpp

2023-09-25 11:19:05 (11112):
Command: VBoxManage -q createvm --name "boinc_fe513e83616ebdd4" --basefolder "C:\boinc\slots\13" --ostype "Linux26_64" --register
Exit Code: 0
Output:
Virtual machine 'boinc_fe513e83616ebdd4' is created and registered.
UUID: 8d9e560e-0623-43bc-9c1e-e0d7f90b569f
Settings file: 'C:\boinc\slots\13\boinc_fe513e83616ebdd4\boinc_fe513e83616ebdd4.vbox'

2023-09-25 11:19:06 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --description "CMS_701686_1695630839.736948_0"
Exit Code: 0
Output:

2023-09-25 11:19:06 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --memory 2048
Exit Code: 0
Output:

2023-09-25 11:19:06 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --cpus 1
Exit Code: 0
Output:

2023-09-25 11:19:07 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --acpi on --ioapic on --rtcuseutc off
Exit Code: 0
Output:

2023-09-25 11:19:07 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --vram 16 --graphicscontroller VBoxVGA
Exit Code: 0
Output:

2023-09-25 11:19:07 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --boot1 disk --boot2 dvd --boot3 none --boot4 none
Exit Code: 0
Output:

2023-09-25 11:19:08 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --nic1 nat --natdnsproxy1 on --cableconnected1 off
Exit Code: 0
Output:

2023-09-25 11:19:08 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --cableconnected1 on
Exit Code: 0
Output:

2023-09-25 11:19:08 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --usb off
Exit Code: 0
Output:

2023-09-25 11:19:09 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --uart1 off --uart2 off
Exit Code: 0
Output:

2023-09-25 11:19:09 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --lpt1 off --lpt2 off
Exit Code: 0
Output:

2023-09-25 11:19:09 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --audio none
Exit Code: 0
Output:
Warning: --audio is deprecated and will be removed soon. Use --audio-driver instead!

2023-09-25 11:19:10 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --clipboard disabled
Exit Code: 0
Output:

2023-09-25 11:19:10 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --draganddrop disabled
Exit Code: 0
Output:

2023-09-25 11:19:10 (11112):
Command: VBoxManage -q storagectl "boinc_fe513e83616ebdd4" --name "Hard Disk Controller" --add "sata" --controller "IntelAHCI" --hostiocache off --portcount 3
Exit Code: 0
Output:

2023-09-25 11:19:10 (11112):
Command: VBoxManage -q showhdinfo "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: 0
Output:
UUID: dae25e8f-de18-4971-b11c-eca764ede402
Parent UUID: base
State: created
Type: multiattach
Location: C:\boinc\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07_prod.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 20480 MBytes
Size on disk: 3853 MBytes
Encryption: disabled
Property: AllocationBlockSize=1048576

2023-09-25 11:19:11 (11112):
Command: VBoxManage -q storageattach "boinc_fe513e83616ebdd4" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: -2135228409
Output:
VBoxManage.exe: error: Cannot attach medium 'C:\boinc\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07_prod.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee IUnknown
VBoxManage.exe: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 785 of file VBoxManageStorageController.cpp

2023-09-25 11:19:11 (11112):
Command: VBoxManage -q closemedium "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: 0
Output:

2023-09-25 11:19:12 (11112):
Command: VBoxManage -q showhdinfo "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: 0
Output:
UUID: dae25e8f-de18-4971-b11c-eca764ede402
Parent UUID: base
State: created
Type: normal (base)
Location: C:\boinc\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07_prod.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 20480 MBytes
Size on disk: 3853 MBytes
Encryption: disabled
Property: AllocationBlockSize=1048576

2023-09-25 11:19:13 (11112):
Command: VBoxManage -q storageattach "boinc_fe513e83616ebdd4" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --medium "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: 0
Output:

2023-09-25 11:19:13 (11112):
Command: VBoxManage -q storageattach "boinc_fe513e83616ebdd4" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --medium none
Exit Code: 0
Output:

2023-09-25 11:19:13 (11112):
Command: VBoxManage -q storageattach "boinc_fe513e83616ebdd4" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "C:\boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: 0
Output:

2023-09-25 11:19:14 (11112):
Command: VBoxManage -q storageattach "boinc_fe513e83616ebdd4" --storagectl "Hard Disk Controller" --port 1 --device 0 --type dvddrive --medium "C:\Program Files\Oracle\VirtualBox/VBoxGuestAdditions.iso"
Exit Code: 0
Output:

2023-09-25 11:19:14 (11112):
Command: VBoxManage -q bandwidthctl "boinc_fe513e83616ebdd4" add "boinc_fe513e83616ebdd4_net" --type network --limit 1024G
Exit Code: 0
Output:

2023-09-25 11:19:14 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --natpf1 ",tcp,127.0.0.1,50338,,80"
Exit Code: 0
Output:

2023-09-25 11:19:15 (11112):
Command: VBoxManage -q list extpacks
Exit Code: 0
Output:
Extension Packs: 1
Pack no. 0: Oracle VM VirtualBox Extension Pack
Version: 7.0.10
Revision: 158379
Edition:
Description: Oracle Cloud Infrastructure integration, Host Webcam, VirtualBox RDP, PXE ROM, Disk Encryption, NVMe, full VM encryption.
VRDE Module: VBoxVRDP
Crypto Module: VBoxPuelCrypto
Usable: true
Why unusable:

2023-09-25 11:19:15 (11112):
Command: VBoxManage -q modifyvm "boinc_fe513e83616ebdd4" --vrde on --vrdeextpack default --vrdeauthlibrary default --vrdeauthtype null --vrdeport 50351
Exit Code: 0
Output:

2023-09-25 11:19:15 (11112):
Command: VBoxManage -q sharedfolder add "boinc_fe513e83616ebdd4" --name "shared" --hostpath "C:\boinc\slots\13/shared"
Exit Code: 0
Output:

2023-09-25 11:19:24 (11112):
Command: VBoxManage -q startvm "boinc_fe513e83616ebdd4" --type headless
Exit Code: 0
Output:
Waiting for VM "boinc_fe513e83616ebdd4" to power on...
VM "boinc_fe513e83616ebdd4" has been successfully started.

2023-09-25 11:19:25 (11112):
Command: VBoxManage -q controlvm "boinc_fe513e83616ebdd4" cpuexecutioncap 80
Exit Code: 0
Output:

2023-09-25 12:19:37 (11112):
Command: VBoxManage -q controlvm "boinc_fe513e83616ebdd4" poweroff
Exit Code: 0
Output:
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

2023-09-25 12:19:37 (11112):
Command: VBoxManage -q snapshot "boinc_fe513e83616ebdd4" list
Exit Code: -108
Output:
This machine does not have any snapshots

2023-09-25 12:19:37 (11112):
Command: VBoxManage -q bandwidthctl "boinc_fe513e83616ebdd4" remove "boinc_fe513e83616ebdd4_net"
Exit Code: 0
Output:

2023-09-25 12:19:37 (11112):
Command: VBoxManage -q unregistervm "boinc_fe513e83616ebdd4" --delete
Exit Code: 0
Output:
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%

12:19:42 (11112): called boinc_finish(194)

</stderr_txt>
]]>
ID: 48657 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,428,032
RAC: 130,040
Message 48658 - Posted: 25 Sep 2023, 12:04:56 UTC - in response to Message 48657.  

All volunteers currently have trouble with CMS tasks as mentioned here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6013&postid=48651

You increased your problem since you became impatient and messed your BOINC/VirtualBox installation.

You should:
- Pause all tasks that are not yet started
- Cancel all running vbox tasks from LHC@home
- Stop BOINC
- Stop VirtualBox
- if you don't have other VMs than those from LHC@home, remove "$HOME\.VirtualBox\VirtualBox.xml"
- Upgrade BOINC and VirtualBox to the most recent versions
- Reboot
- Resume work


Until the CMS problems are solved server side disable CMS and run Theory tasks.
ID: 48658 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48665 - Posted: 26 Sep 2023, 16:59:27 UTC

what I don't understand is that CMS tasks are still being distributed, although they are still faulty. What sense does this make?
ID: 48665 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,428,553
RAC: 265,567
Message 48666 - Posted: 26 Sep 2023, 17:21:10 UTC - in response to Message 48665.  
Last modified: 26 Sep 2023, 17:23:07 UTC

Its because of the project is designed and operated cf https://lhcathome.web.cern.ch/about/how-it-works

The BOINC job creation is indapendent of the CernVM Co-Pilot job creation.

Our observation is that many BOINC jobs fail, the BOINC jobs are automatically created regardless of if there is Co-Pilot jobs, so the BOINC server tries to maintain say 1000 unsent BOINC jobs.

As CM said if there is no Co-Pilot jobs then BOINC spins up the VM on our computers if nothing happens for 10 min then the BOINC task is failed as nothing happened, the BOINC server takes this back and make a new one to maintain the 1000 unsent requirement.

This is different to how other BOINC projects work as there the BOINC jobs are created by the scientist so all work is BOINC work.

The reason why they are indapendent was because the team as CERN did not want to bother the CERN scientist with creating BOINC jobs, they use the regular internal tools and if its process internally to CERN or by us is transparent to the CERN scientists.

I guess that Ivan could switch on and off the BOINC job creation but he doesn't seem to have access to this infrasturcture so its not so easy to toggle on and off.
ID: 48666 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,605,205
RAC: 105,351
Message 48687 - Posted: 29 Sep 2023, 12:06:56 UTC - in response to Message 48651.  

Fedrica wrote on September 24:
Hi Erich56 and all,
thanks for your feedback.
We have submitted yesterday two new workflows for CMS@home, configured as usual, but I see no volunteers have been connected to the batch pool of CMS. The It might be is a problem at the level of the "health" check of the system, executed before allowing the VM connection to the CMS pool.
It seems a problem with the required token, but we have to investigate.
Please leave your machines connected to CMS@home.
We let you know as soon as the problem is discovered (and hopefully resolved)
Thanks,
cheers
Federica for CMS@home support
hi Federica - any news yet on the current problem?
ID: 48687 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2120
Credit: 159,926,969
RAC: 70,085
Message 48715 - Posted: 1 Oct 2023, 2:19:31 UTC

2023-10-01 04:15:31 (24016): Guest Log: [DEBUG]
2023-10-01 04:15:31 (24016): Guest Log: ERROR: Couldn't read proxy from: /tmp/x509up_u0
2023-10-01 04:15:31 (24016): Guest Log: globus_credential: Error reading proxy credential
2023-10-01 04:15:31 (24016): Guest Log: globus_credential: Error reading proxy credential: Couldn't read PEM from bio
2023-10-01 04:15:31 (24016): Guest Log: OpenSSL Error: pem_lib.c:707: in library: PEM routines, function PEM_read_bio: no start line
2023-10-01 04:15:31 (24016): Guest Log: Use -debug for further information.
2023-10-01 04:15:31 (24016): Guest Log: [ERROR] Could not get an x509 credential
2023-10-01 04:15:31 (24016): Guest Log: [ERROR] The x509 proxy creation failed.
2023-10-01 04:15:31 (24016): Guest Log: [DEBUG] Volunteer: maeax (75468)
2023-10-01 04:15:31 (24016): Guest Log: [INFO] Shutting Down.
2023-10-01 04:16:01 (24016): VM Completion File Detected.
2023-10-01 04:16:01 (24016): VM Completion Message: The x509 proxy creation failed.
ID: 48715 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,428,032
RAC: 130,040
Message 48732 - Posted: 3 Oct 2023, 9:49:06 UTC

Since yesterday afternoon CMS tasks are running fine.
ID: 48732 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2120
Credit: 159,926,969
RAC: 70,085
Message 48733 - Posted: 3 Oct 2023, 10:39:27 UTC - in response to Message 48715.  
Last modified: 3 Oct 2023, 11:17:44 UTC

12:27:09 +0200 2023-10-03 [INFO] Requesting an idtoken from LHC@Home
X509_USER_PROXY = /tmp/x509up_u1000

Waiting, waiting.. since a quarter of hour.
Half an hour now, canceling after one hour of waiting...

2023-10-03 12:26:53 (28104): Guest Log: [INFO] Reading volunteer information
2023-10-03 12:27:08 (28104): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2023-10-03 12:27:09 (28104): Guest Log: [INFO] CMS application starting. Check log files.
2023-10-03 12:27:10 (28104): Guest Log: [INFO] Requesting an idtoken from LHC@home
2023-10-03 13:14:39 (28104): Powering off VM.
2023-10-03 13:14:40 (28104): Successfully stopped VM.
2023-10-03 13:14:40 (28104): Deregistering VM. (boinc_a31803bb699ff414, slot#33)
2023-10-03 13:14:40 (28104): Removing network bandwidth throttle group from VM.
2023-10-03 13:14:40 (28104): Removing VM from VirtualBox.

Hypervisor System Log:

313:12:21.054578 Saving settings file "S:\ProgramData\BOINC\slots\17\boinc_9ebd457c4ec9cc22\boinc_9ebd457c4ec9cc22.vbox" with version "1.19-windows"
ID: 48733 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,710
RAC: 2,852
Message 48734 - Posted: 3 Oct 2023, 12:50:28 UTC - in response to Message 48732.  
Last modified: 3 Oct 2023, 13:06:07 UTC

Since yesterday afternoon CMS tasks are running fine.
The x509 error is gone, but it seems no sub tasks available: https://lhcathome.cern.ch/lhcathome/result.php?resultid=400015008
Edit: On 2nd try I got a job. The cmsRun started after about 8 minutes uptime.
ID: 48734 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 33
Credit: 1,161,776
RAC: 4,966
Message 48736 - Posted: 3 Oct 2023, 13:18:18 UTC - in response to Message 48734.  

chez moi cela ne fonctionne toujours pas.

At home it still does not work.

nom CMS_4087407_1696332748.085478
applications CMS Simulation
créé 3 Oct 2023, 11:32:28 UTC
quorum minimum 1
réplication initiale 1
nombre maximum de tâches en erreur/totales/succès 1, 1, 1
erreurs Trop de résultats totaux
ID: 48736 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,710
RAC: 2,852
Message 48737 - Posted: 3 Oct 2023, 13:26:34 UTC - in response to Message 48736.  
Last modified: 3 Oct 2023, 13:56:32 UTC

Do you have 'native' selected in your preferences? https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project
For the Windows VBox tasks it should not have been selected.
ID: 48737 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : All tasks failing


©2024 CERN