ATLAS vbox v2.02

Author	Message
[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,890,834 RAC: 10,764	Message 47083 - Posted: 5 Aug 2022, 8:18:36 UTC - in response to Message 47082. Last modified: 5 Aug 2022, 8:20:09 UTC On the same computer using the same VirtualBox instance and the same user account? Yes. Lhc@Home and Lhc-Dev (and Rosetta and QChemPedia, etc) are on the same pc On Windows it's most likely Hyper-V or an AV software that crashes VirtualBox and/or makes the BOINC client think VT-x/AMD-V is disabled. No Hyper-v installed No messages on antivirus Windows 11 task manager says that virtualization is enabled ID: 47083 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2681 Credit: 286,838,618 RAC: 64,689	Message 47084 - Posted: 5 Aug 2022, 8:57:24 UTC - in response to Message 47083. You may check this: https://forums.virtualbox.org/viewtopic.php?f=1&t=62339 Your computer at LHC dev reports this CPU: AuthenticAMD AMD Ryzen 5 5500U with Radeon Graphics The fauty tasks from your previous post are from a computer reporting this CPU: AuthenticAMD AMD Ryzen 5 3600 6-Core Processor Your computers here, at Rosetta and QChemPedia are hidden which makes it impossible to check anything. ID: 47084 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47085 - Posted: 5 Aug 2022, 9:19:22 UTC - in response to Message 47078. The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks, there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start. Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste. @CP will check this in the next days, when it coming again. Within the stderr.txt there's again the hint to look through the hardening log. See: https://forums.virtualbox.org/viewtopic.php?f=25&t=82106 @computezrmle no Info of VBox.log or VboxHardening.log here! ID: 47085 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,855,461 RAC: 2,778	Message 47086 - Posted: 5 Aug 2022, 10:07:27 UTC - in response to Message 47085. Last modified: 5 Aug 2022, 17:38:57 UTC The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks, there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start. Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste. @CP will check this in the next days, when it coming again. When you have such a running VM combined with low cpu-usage, you could try to revive such a task. If something goes wrong. Nevermind, you would abort such a task anyway. How trying to revive? Suspend all tasks not yet started. Suspend the evil task with "Leave applications in memory" not selected. The task will be saved to disk. Use VirtualBox Manager to disgard the saved state. Start the VM using VirtualBox Manager. It will boot from scratch. You may use the ALT-keys for monitoring cpu and event processing. When it starts event processing, close the VM from the menu (not the red cross) with option save to disk. When it is saved, you may resume the task in BOINC Manager. ID: 47086 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47087 - Posted: 5 Aug 2022, 11:27:32 UTC - in response to Message 47086. Ok Crystal. ID: 47087 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 248 Credit: 1,890,834 RAC: 10,764	Message 47088 - Posted: 5 Aug 2022, 12:26:58 UTC - in response to Message 47084. You may check this: https://forums.virtualbox.org/viewtopic.php?f=1&t=62339 Done! Your computer at LHC dev reports this CPU: AuthenticAMD AMD Ryzen 5 5500U with Radeon Graphics The fauty tasks from your previous post are from a computer reporting this CPU: AuthenticAMD AMD Ryzen 5 3600 6-Core Processor Sorry, my fault. BOTH of my pcs fault on this new Atlas app. And BOTH has no problems with others VM projects Tonight i'll retry with both Your computers here, at Rosetta and QChemPedia are hidden which makes it impossible to check anything. I'll change this ID: 47088 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 1,235	Message 47089 - Posted: 5 Aug 2022, 17:57:35 UTC - in response to Message 47083. Last modified: 5 Aug 2022, 21:12:41 UTC In February, I had a problem with Theory, where I had made sure that VT-x was enabled but Boinc wasn't seeing it and was reporting as disabled. An AVAST (AVG is similar, I believe) update had added, and checked by default, a feature in Menu - Settings - Troubleshooting called Enable hardware-assisted virtualisation which is checked by default but needs to be UNCHECKED https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5797&postid=46366 ID: 47089 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 780 Credit: 59,881,889 RAC: 47,661	Message 47090 - Posted: 5 Aug 2022, 21:38:11 UTC Last modified: 5 Aug 2022, 21:39:10 UTC Question about the server side settings: on Host computer details page, in Application details page, the Atlas vbox 2.00 data has been removed and now showing data for only v2.02 (number of completed tasks, APR etc.). This is not the case for other subprojects, they still show the data for the old versions of applications and not just for the current versions. I wonder why Atlas is different? ID: 47090 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 47091 - Posted: 6 Aug 2022, 7:27:09 UTC 2.02 seems OK on my Windows 11 host. Tulio ID: 47091 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47092 - Posted: 6 Aug 2022, 7:28:56 UTC - in response to Message 47090. On this page is a Button: Show all versions. ID: 47092 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 780 Credit: 59,881,889 RAC: 47,661	Message 47093 - Posted: 6 Aug 2022, 13:11:24 UTC - in response to Message 47092. On this page is a Button: Show all versions. OK, I didn't see that. Thanks for pointing that out. ID: 47093 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47094 - Posted: 6 Aug 2022, 13:50:07 UTC - in response to Message 47081. Last modified: 6 Aug 2022, 14:11:44 UTC The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks, there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start. This is the reason for this handful tasks every day. 200 tasks connect to CVMFS, but this small number of tasks not. Have no idea why. Is it possible to exit this task during running? Because atm you must stop this task yourself and control it. btw: our best Volunteer (Toby Broom) have also this never ending Atlas-Tasks. ID: 47094 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,855,461 RAC: 2,778	Message 47095 - Posted: 6 Aug 2022, 17:58:48 UTC - in response to Message 47094. Is it possible to exit this task during running? Because atm you must stop this task yourself and control it. In principle you could create a workaround. I tested it on the development system. At least such a task would not have to run until the user's eye is catching such a task. https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3106173 It's best of course to avoid this happening or let a watchdog inside the VM doing the job. ID: 47095 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2681 Credit: 286,838,618 RAC: 64,689	Message 47102 - Posted: 8 Aug 2022, 7:14:56 UTC - in response to Message 47074. Regarding the children a multiattach disk can have: I didn't find an official limit. Own tests ran fine with up to 14 per BOINC client and 2 clients per host (different usernames), hence 28 per host. Meanwhile I'm aware of another volunteer's Windows computer running (at least) 39 ATLAS tasks concurrently. A rough estimate points out that on this computer ATLAS v2.02 avoids ~160 GB per day that would have been written to disk by the previous version. ID: 47102 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47108 - Posted: 8 Aug 2022, 12:21:43 UTC - in response to Message 47102. Last modified: 8 Aug 2022, 12:25:41 UTC A rough estimate points out that on this computer ATLAS v2.02 avoids ~160 GB per day that would have been written to disk by the previous version. 100 Atlas-Tasks per day and PC (for two Computer 2x100) and 250 GByte from ISP. 9 TByte including Atlas last month. ID: 47108 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,627,948 RAC: 114,040	Message 47111 - Posted: 9 Aug 2022, 4:44:29 UTC - in response to Message 47108. Last modified: 9 Aug 2022, 4:45:40 UTC 2022-08-08 19:07:23 (21468): Guest Log: 00:00:00.001517 main 5.2.32 r132073 started. Verbose level = 0 2022-08-08 19:07:33 (21468): Guest Log: 00:00:10.007488 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 189 278 340 000ns (GuestNow=1 659 978 452 271 131 000 ns GuestLast=1 659 985 641 549 471 000 ns fSetTimeLastLoop=true ) 2022-08-08 20:47:22 (21468): Status Report: Elapsed Time: '6000.000000' 2022-08-08 20:47:22 (21468): Status Report: CPU Time: '52.718750' 2022-08-08 22:27:30 (21468): Status Report: Elapsed Time: '12000.000000' 2022-08-08 22:27:30 (21468): Status Report: CPU Time: '78.906250' 2022-08-09 00:07:39 (21468): Status Report: Elapsed Time: '18000.000000' 2022-08-09 00:07:39 (21468): Status Report: CPU Time: '106.875000' 2022-08-09 01:47:49 (21468): Status Report: Elapsed Time: '24000.000000' 2022-08-09 01:47:49 (21468): Status Report: CPU Time: '130.171875' 2022-08-09 03:27:57 (21468): Status Report: Elapsed Time: '30000.000000' 2022-08-09 03:27:57 (21468): Status Report: CPU Time: '157.125000' 2022-08-09 05:08:06 (21468): Status Report: Elapsed Time: '36000.000000' 2022-08-09 05:08:06 (21468): Status Report: CPU Time: '178.687500' We need a Atlas-stop for this. CVMFS connect problem! 11 hour for nothing(10 CPU)! ID: 47111 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,855,461 RAC: 2,778	Message 47112 - Posted: 9 Aug 2022, 7:04:39 UTC - in response to Message 47111. We need a Atlas-stop for this. CVMFS connect problem! Are there other users with so many CVMFS-connect problems? I have not so many ATLAS-tasks running, but no one failed on my side. All CVMFS-response times here are between 3 and at the most 8 seconds. I did not view all your results, but from your valid tasks the response times are between 3 and 81 seconds. Maybe there is somewhere a limit (90 sec.?) to get a response, else you will never get one or is rejected by the network, because too late? To me it seems to be a network issue on your side or CERN's side. If on CERN's side (max # connections?) more users would suffer from this. ID: 47112 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2681 Credit: 286,838,618 RAC: 64,689	Message 47113 - Posted: 9 Aug 2022, 7:28:51 UTC - in response to Message 47112. Might be worth to tune the TCP settings (on the heavy load workers and the computer running the Squid proxy): https://support.solarwinds.com/SuccessCenter/s/article/NETSTAT-A-command-displays-too-many-TCP-IP-connections?language=en_US Sections: "Increase the maximum simultaneous connections" "Reduce the duration of the Reserved State" ID: 47113 · Reply Quote

entity Send message Joined: 7 May 17 Posts: 6 Credit: 695,132 RAC: 0	Message 47115 - Posted: 9 Aug 2022, 17:54:02 UTC - in response to Message 47113. Started getting immediate errors with ATLAS multi-attach tasks. Prior to today they ran fine. A snippet of the output is included. 2022-08-09 12:45:41 (3632470): Command: VBoxManage -q showhdinfo "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: 0 Output: UUID: 6f08958e-7bfd-4804-8dd7-c7b4408cb126 Parent UUID: base State: created Type: multiattach Location: /var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi Storage format: VDI Format variant: dynamic default Capacity: 20480 MBytes Size on disk: 2645 MBytes Encryption: disabled Property: AllocationBlockSize=1048576 Child UUIDs: 80093e87-fcef-479f-801f-dd2cc020d954 2022-08-09 12:45:42 (3632470): Command: VBoxManage -q storageattach "boinc_a50fb3ffb7fe0a5b" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228409 Output: VBoxManage: error: Cannot attach medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 772 of file VBoxManageStorageController.cpp 2022-08-09 12:45:42 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp 2022-08-09 12:45:43 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp 2022-08-09 12:45:45 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp 2022-08-09 12:45:46 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp 2022-08-09 12:45:47 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp 2022-08-09 12:45:49 (3632470): Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi" Exit Code: -2135228404 Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp ID: 47115 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2681 Credit: 286,838,618 RAC: 64,689	Message 47116 - Posted: 9 Aug 2022, 18:17:28 UTC - in response to Message 47115. Your computer list is empty. Very unusual. Since you wrote "...Prior to today they ran fine..." there should be a computer entry and at least 1 (more likely more) tasks sent out to that computer. ID: 47116 · Reply Quote

LHC@home