Message boards :
Number crunching :
Missing heartbeat file errors
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
Perfect! Now take that very same VM, eject the CDROM image and replace the hard disk image with the Theory_2016_11_02.vdi from this project. Start the VM and see what happens. Note that to use the image directly in this way you need to provided the shared directory with the init_data.xml file. VM with your proposed config started well. After 1 minute uptime the new heartbeat file was created in the shared directory and 1 minute later a new job started. I'll let it run overnight. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,355 |
Have overnight let a second task running with the first .vdi. It finished at 04:29 UTC today with TEN finished Condor-Jobs. This is from the output of the new Theory_.vdi: Last two lines in VM: Starting libvirtd daemon: [ok] /etc/rc3.d/S99local: line 1: /cvmfs/grid.cern.ch/vc/sbin/bootstrap: No such file or directory bootlogd: no process killed Must context.iso removed from Storage ? Storage in VM IDE Primary Master: Theory_2016_11_02.vdi (Normal 20,00 GB) IDE Primary Slave: [Optical Drive]context.iso(356,00 KB) IDE Secondary Master: [Optical Drive]VBoxGuestAdditions.iso (56,66 MB) Saw hardening errors: 00:06:16.189045 supR3HardenedErrorV: supR3HardenedScreenImage/LdrLoadDll: rc=VERR_SUP_VP_NOT_OWNED_BY_TRUSTED_INSTALLER fImage=1 fProtect=0x0 fAccess=0x0 \Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll: supHardenedWinVerifyImageByHandle: TrustedInstaller is not the owner of '\Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll'. 00:06:16.189206 supR3HardenedErrorV: supR3HardenedMonitor_LdrLoadDll: rejecting 'C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll' (C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll): rcNt=0xc0000190 00:06:16.190659 supR3HardenedErrorV: supR3HardenedScreenImage/LdrLoadDll: cached rc=VERR_SUP_VP_NOT_OWNED_BY_TRUSTED_INSTALLER fImage=1 fProtect=0x0 fAccess=0x0 cHits=1 \Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll 00:06:16.190739 supR3HardenedErrorV: supR3HardenedMonitor_LdrLoadDll: rejecting 'C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll' (C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll): rcNt=0xc0000190 00:06:16.192151 supR3HardenedErrorV: supR3HardenedScreenImage/LdrLoadDll: cached rc=VERR_SUP_VP_NOT_OWNED_BY_TRUSTED_INSTALLER fImage=1 fProtect=0x0 fAccess=0x0 cHits=2 \Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll 00:06:16.192175 supR3HardenedErrorV: supR3HardenedMonitor_LdrLoadDll: rejecting 'C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll' (C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll): rcNt=0xc0000190 00:06:16.193638 supR3HardenedErrorV: supR3HardenedScreenImage/LdrLoadDll: cached rc=VERR_SUP_VP_NOT_OWNED_BY_TRUSTED_INSTALLER fImage=1 fProtect=0x0 fAccess=0x0 cHits=3 \Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll 00:06:16.193709 supR3HardenedErrorV: supR3HardenedMonitor_LdrLoadDll: rejecting 'C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll' (C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll): rcNt=0xc0000190 00:06:16.195147 supR3HardenedErrorV: supR3HardenedScreenImage/LdrLoadDll: cached rc=VERR_SUP_VP_NOT_OWNED_BY_TRUSTED_INSTALLER fImage=1 fProtect=0x0 fAccess=0x0 cHits=4 \Device\HarddiskVolume4\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll 00:06:16.195220 supR3HardenedErrorV: supR3HardenedMonitor_LdrLoadDll: rejecting 'C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll' (C:\Users\x\AppData\Local\Microsoft\OneDrive\17.3.6720.1207\amd64\FileSyncShell64.dll): rcNt=0xc0000190 |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
Must context.iso removed from Storage ? Yes |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
Perfect! Now take that very same VM, eject the CDROM image and replace the hard disk image with the Theory_2016_11_02.vdi from this project. Start the VM and see what happens. Note that to use the image directly in this way you need to provided the shared directory with the init_data.xml file. VM finished after >12 hours runtime with the Theory_2016_11_02.vdi image: 00:02:17.840314 VMMDev: Guest Log: [INFO] New Job Starting in slot1 00:02:17.903079 VMMDev: Guest Log: [INFO] Condor JobID: 849938. 0 in slot1 00:02:23.087032 VMMDev: Guest Log: [INFO] MCPlots JobID: 34607506 in slot1 00:59:57.017772 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 01:00:00.343660 VMMDev: Guest Log: [INFO] New Job Starting in slot1 01:00:00.637763 VMMDev: Guest Log: [INFO] Condor JobID: 829749. 0 in slot1 01:00:05.946158 VMMDev: Guest Log: [INFO] MCPlots JobID: 34587424 in slot1 03:39:29.935414 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 03:39:33.953839 VMMDev: Guest Log: [INFO] New Job Starting in slot1 03:39:34.263818 VMMDev: Guest Log: [INFO] Condor JobID: 853036. 0 in slot1 03:39:39.798883 VMMDev: Guest Log: [INFO] MCPlots JobID: 34610676 in slot1 12:25:44.599676 VMMDev: Guest Log: [INFO] Job finished in slot1 with 0. 12:36:09.987195 VMMDev: Guest Log: [INFO] Condor exited with return value N/A. 12:36:10.047502 VMMDev: Guest Log: [INFO] Shutting Down. Not that many jobs, because I paused the VM sometimes for other duties or used the VM with 20% execution cap. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
So it looks like there is an issue with the Theory_2016_11_02.vdi image that we are using. As it is working for CP (Windows 7) but not for maeax and Jesse (Windows 10) this suggest a compatibility problem. My guess would be that the virtual hardware in the VM differs between the VM where the image was built and the VMs where it is failing. Please could those of you who have being testing email to me the the .vbox file for the VM. EDIT: If anyone wants to investigate, here is an example of a .vbox file used for the build. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
Please could those of you who have being testing email to me the the .vbox file for the VM. I suppose, you're only interested in the *VM*.vbox file if not working correctly? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,355 |
Have a SSD for storage, is this a problem for .vbox? EDIT: Theory_.vdi have Linux_64Bits_Generic from 16/10/24 4.1.34-22 uc_.vdi have Linux_64Bits_Generic from 16/11/7 4.1.35-25 both have 4.3.28 r100309. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
I managed to reproduce the error on my machine by overwriting the network block in the vbox XML file of a test VM with the respective content from a vbox XML file of a VM which was not working. So it looks like that error is generated when it can't access the network. Still trying to understand what exactly is making it fail. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
From what I can see this doesn't work < Adapter slot="0" enabled="true" MACAddress="080027F1E677" type="82540EM"> but this does work < Adapter slot="0" enabled="true" MACAddress="080027F1E677" type="82540EM" cable="true" > What cable="true" means, why it is required and why it is not there still needs to be understood. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,578,481 RAC: 1,605 |
As I remember, we've been here before in the days of T4T. It may be that "Cable" refers to the simulated network cable seen by the VM, i.e. cable=false simulates the network cable unplugged. This is the VBoxmanage modifyvm command that controls this, from an old manual. --cableconnected<1-N> on|off: This allows you to temporarily disconnect a virtual network interface, as if a network cable had been pulled from a real network card. This might be useful for resetting certain software components in the VM. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,355 |
In a old T4T Forum saw this example of Parameter in Vboxlog: [/Devices/e1000/0/Config/] (level 4) 00:00:01.528 AdapterType <integer> = 0x0000000000000000 (0) 00:00:01.528 cableConnected <integer> = 0x0000000000000001 (1) 00:00:01.528 LineSpeed <integer> = 0x0000000000000000 (0) 00:00:01.528 MAC <bytes> = "08 00 27 fc 44 b2" (cb=6) |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
From what I can see this doesn't work Sorry to say Laurence, but my VM was working well and don't have the "cable="true"" at the end of the Adapter slot="0" line. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Sorry to say Laurence, but my VM was working well and don't have the "cable="true"" at the end of the Adapter slot="0" line. It could be that the value it defaults to if not set differs. What happens if you explicitly set cable="false"? Is the machine you are testing it on connected via WiFi or an Ethernet cable? If wifi, does it at least have an Ethernet port? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,355 |
Helo CP, do you have Windows 7? Laurence wrote, it is in Windows 10. Have ucernvm-prod_.vdi with context.iso started and found in vbox.log: 00:00:02.656643 [/Devices/e1000/] (level 2) 00:00:02.656644 00:00:02.656645 [/Devices/e1000/0/] (level 3) 00:00:02.656646 PCIBusNo <integer> = 0x0000000000000000 (0) 00:00:02.656648 PCIDeviceNo <integer> = 0x0000000000000003 (3) 00:00:02.656649 PCIFunctionNo <integer> = 0x0000000000000000 (0) 00:00:02.656650 Trusted <integer> = 0x0000000000000001 (1) 00:00:02.656651 00:00:02.656652 [/Devices/e1000/0/Config/] (level 4) 00:00:02.656653 AdapterType <integer> = 0x0000000000000000 (0) 00:00:02.656655 CableConnected <integer> = 0x0000000000000001 (1) 00:00:02.656656 LineSpeed <integer> = 0x0000000000000000 (0) 00:00:02.656657 MAC <bytes> = "08 00 27 a7 21 cb" (cb=6) 00:00:02.656659 00:00:02.656660 [/Devices/e1000/0/LUN#0/] (level 4) 00:00:02.656662 Driver <string> = "NAT" (cb=4) 00:00:02.656663 00:00:02.656663 [/Devices/e1000/0/LUN#0/Config/] (level 5) 00:00:02.656666 AliasMode <integer> = 0x0000000000000000 (0) 00:00:02.656668 BootFile <string> = "L_vdi.pxe" (cb=10) 00:00:02.656669 DNSProxy <integer> = 0x0000000000000000 (0) 00:00:02.656670 Network <string> = "10.0.2.0/24" (cb=12) 00:00:02.656671 PassDomain <integer> = 0x0000000000000001 (1) 00:00:02.656672 TFTPPrefix <string> = "C:\Users\x\.VirtualBox\TFTP" (cb=36) 00:00:02.656674 UseHostResolver <integer> = 0x0000000000000000 (0) 00:00:02.656675 00:00:02.656675 [/Devices/e1000/0/LUN#999/] (level 4) 00:00:02.656677 Driver <string> = "MainStatus" (cb=11) 00:00:02.656678 00:00:02.656679 [/Devices/e1000/0/LUN#999/Config/] (level 5) 00:00:02.656681 First <integer> = 0x0000000000000000 (0) 00:00:02.656682 Last <integer> = 0x0000000000000000 (0) 00:00:02.656683 papLeds <integer> = 0x0000000001942ac8 (26 487 496) The first Condor-Task is running at the moment!! |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,578,481 RAC: 1,605 |
This appears in the VM Manager GUI under "settings/network/advanced". |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Please could you start a new Theory task with BOINC and then exit BOINC. The VM files should be in C:\ProgramData\BOINC\slots\0\boinc_xxx/boinc_xxx.vbox. You can then edit that file to set cable="true" and then open VirtualBox to start the VM manually. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,355 |
Please could you start a new Theory task with BOINC and then exit BOINC. The VM files should be in C:\ProgramData\BOINC\slots\0\boinc_xxx/boinc_xxx.vbox. You can then edit that file to set cable="true" and then open VirtualBox to start the VM manually. Under Scientific Linux or Windows 10? In Windows 10 Cable connected is on in boinc_xxx.vbox. The task finished after 11 Min. https://lhcathome.cern.ch/lhcathome/result.php?resultid=110942801 When Boinc is closed, the boinc_xxx.vbox is always running and don't stopp. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
Sorry to say Laurence, but my VM was working well and don't have the "cable="true"" at the end of the Adapter slot="0" line. Running a fresh booted VM with <Adapter slot="0" enabled="true" MACAddress="0000000000" type="82540EM" cable="false"> (changed MAC) in CernVM.vbox and 20 minutes later still got no job. Heartbeat file in shared folder is refreshed frequently. The machine is even not capable for Wifi, so that machine is on LAN-directly. |
Send message Joined: 12 Feb 14 Posts: 72 Credit: 4,639,155 RAC: 0 |
I have tried that. Adding cable="true" did not allow the VM to work. I have even gave it a hard power down, edited the .vbox file, and manually restarted the VM. That did not allow the VM to work. Have you tried using a utility like diff or WinMerge on some of the .vbox files for the VMs that do work and the VMs that do not work? I also noticed that ATLAS@home uses the same network configuration in its .vbox files, and they still work. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,476,653 RAC: 2,833 |
After the cable="false" was removed and also after edit with cable="true", the VM is still working, but I don't get a job in any scenario. I'll try a fresh start later with the original *.vdi. Except a few timestamps there are no differences between the vbox-files. |
©2024 CERN