Message boards : ATLAS application : ATLAS vbox v2.02
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 190
Credit: 1,499,854
RAC: 200
Message 47083 - Posted: 5 Aug 2022, 8:18:36 UTC - in response to Message 47082.  
Last modified: 5 Aug 2022, 8:20:09 UTC

On the same computer using the same VirtualBox instance and the same user account?

Yes. Lhc@Home and Lhc-Dev (and Rosetta and QChemPedia, etc) are on the same pc

On Windows it's most likely Hyper-V or an AV software that crashes VirtualBox and/or makes the BOINC client think VT-x/AMD-V is disabled.

No Hyper-v installed
No messages on antivirus
Windows 11 task manager says that virtualization is enabled
ID: 47083 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,019,169
RAC: 136,189
Message 47084 - Posted: 5 Aug 2022, 8:57:24 UTC - in response to Message 47083.  

You may check this:
https://forums.virtualbox.org/viewtopic.php?f=1&t=62339


Your computer at LHC dev reports this CPU:
AuthenticAMD AMD Ryzen 5 5500U with Radeon Graphics

The fauty tasks from your previous post are from a computer reporting this CPU:
AuthenticAMD AMD Ryzen 5 3600 6-Core Processor


Your computers here, at Rosetta and QChemPedia are hidden which makes it impossible to check anything.
ID: 47084 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47085 - Posted: 5 Aug 2022, 9:19:22 UTC - in response to Message 47078.  

The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks,
there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start.

Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste.


@CP will check this in the next days, when it coming again.

Within the stderr.txt there's again the hint to look through the hardening log.
See:
https://forums.virtualbox.org/viewtopic.php?f=25&t=82106

@computezrmle no Info of VBox.log or VboxHardening.log here!
ID: 47085 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,800
RAC: 1,930
Message 47086 - Posted: 5 Aug 2022, 10:07:27 UTC - in response to Message 47085.  
Last modified: 5 Aug 2022, 17:38:57 UTC

The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks,
there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start.
Waiting for a new task with this Problem, because vbox.log and hardening.log had some trouble when copy+Paste.
@CP will check this in the next days, when it coming again.
When you have such a running VM combined with low cpu-usage, you could try to revive such a task.
If something goes wrong. Nevermind, you would abort such a task anyway. How trying to revive?

Suspend all tasks not yet started.
Suspend the evil task with "Leave applications in memory" not selected. The task will be saved to disk.
Use VirtualBox Manager to disgard the saved state.
Start the VM using VirtualBox Manager. It will boot from scratch.
You may use the ALT-keys for monitoring cpu and event processing.
When it starts event processing, close the VM from the menu (not the red cross) with option save to disk.
When it is saved, you may resume the task in BOINC Manager.
ID: 47086 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47087 - Posted: 5 Aug 2022, 11:27:32 UTC - in response to Message 47086.  

Ok Crystal.
ID: 47087 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 190
Credit: 1,499,854
RAC: 200
Message 47088 - Posted: 5 Aug 2022, 12:26:58 UTC - in response to Message 47084.  

You may check this:
https://forums.virtualbox.org/viewtopic.php?f=1&t=62339

Done!


Your computer at LHC dev reports this CPU:
AuthenticAMD AMD Ryzen 5 5500U with Radeon Graphics
The fauty tasks from your previous post are from a computer reporting this CPU:
AuthenticAMD AMD Ryzen 5 3600 6-Core Processor

Sorry, my fault.
BOTH of my pcs fault on this new Atlas app.
And BOTH has no problems with others VM projects
Tonight i'll retry with both

Your computers here, at Rosetta and QChemPedia are hidden which makes it impossible to check anything.

I'll change this
ID: 47088 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 47089 - Posted: 5 Aug 2022, 17:57:35 UTC - in response to Message 47083.  
Last modified: 5 Aug 2022, 21:12:41 UTC

In February, I had a problem with Theory, where I had made sure that VT-x was enabled but Boinc wasn't seeing it and was reporting as disabled. An AVAST (AVG is similar, I believe) update had added, and checked by default, a feature in
Menu - Settings - Troubleshooting called Enable hardware-assisted virtualisation which is checked by default but needs to be UNCHECKED
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5797&postid=46366
ID: 47089 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,167,342
RAC: 16,168
Message 47090 - Posted: 5 Aug 2022, 21:38:11 UTC
Last modified: 5 Aug 2022, 21:39:10 UTC

Question about the server side settings: on Host computer details page, in Application details page, the Atlas vbox 2.00 data has been removed and now showing data for only v2.02 (number of completed tasks, APR etc.). This is not the case for other subprojects, they still show the data for the old versions of applications and not just for the current versions. I wonder why Atlas is different?
ID: 47090 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 47091 - Posted: 6 Aug 2022, 7:27:09 UTC

2.02 seems OK on my Windows 11 host.
Tulio
ID: 47091 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47092 - Posted: 6 Aug 2022, 7:28:56 UTC - in response to Message 47090.  

On this page is a Button: Show all versions.
ID: 47092 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,167,342
RAC: 16,168
Message 47093 - Posted: 6 Aug 2022, 13:11:24 UTC - in response to Message 47092.  

On this page is a Button: Show all versions.

OK, I didn't see that. Thanks for pointing that out.
ID: 47093 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47094 - Posted: 6 Aug 2022, 13:50:07 UTC - in response to Message 47081.  
Last modified: 6 Aug 2022, 14:11:44 UTC

The Virtual Machine is created and booted (that's fine}, but in all your aborted low-cpu using tasks,
there is never coming "CVMFS is ok" after "Checking CVMFS...". - Without connection to CVMFS the job will not start.

This is the reason for this handful tasks every day. 200 tasks connect to CVMFS, but this small number of tasks not.
Have no idea why.
Is it possible to exit this task during running?
Because atm you must stop this task yourself and control it.
btw: our best Volunteer (Toby Broom) have also this never ending Atlas-Tasks.
ID: 47094 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,800
RAC: 1,930
Message 47095 - Posted: 6 Aug 2022, 17:58:48 UTC - in response to Message 47094.  

Is it possible to exit this task during running?
Because atm you must stop this task yourself and control it.
In principle you could create a workaround. I tested it on the development system.
At least such a task would not have to run until the user's eye is catching such a task.
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3106173

It's best of course to avoid this happening or let a watchdog inside the VM doing the job.
ID: 47095 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,019,169
RAC: 136,189
Message 47102 - Posted: 8 Aug 2022, 7:14:56 UTC - in response to Message 47074.  

Regarding the children a multiattach disk can have:
I didn't find an official limit.
Own tests ran fine with up to 14 per BOINC client and 2 clients per host (different usernames), hence 28 per host.

Meanwhile I'm aware of another volunteer's Windows computer running (at least) 39 ATLAS tasks concurrently.
A rough estimate points out that on this computer ATLAS v2.02 avoids ~160 GB per day that would have been written to disk by the previous version.
ID: 47102 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47108 - Posted: 8 Aug 2022, 12:21:43 UTC - in response to Message 47102.  
Last modified: 8 Aug 2022, 12:25:41 UTC

A rough estimate points out that on this computer ATLAS v2.02 avoids ~160 GB per day that would have been written to disk by the previous version.

100 Atlas-Tasks per day and PC (for two Computer 2x100) and 250 GByte from ISP. 9 TByte including Atlas last month.
ID: 47108 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 47111 - Posted: 9 Aug 2022, 4:44:29 UTC - in response to Message 47108.  
Last modified: 9 Aug 2022, 4:45:40 UTC

2022-08-08 19:07:23 (21468): Guest Log: 00:00:00.001517 main 5.2.32 r132073 started. Verbose level = 0
2022-08-08 19:07:33 (21468): Guest Log: 00:00:10.007488 timesync vgsvcTimeSyncWorker: Radical guest time change: -7 189 278 340 000ns (GuestNow=1 659 978 452 271 131 000 ns GuestLast=1 659 985 641 549 471 000 ns fSetTimeLastLoop=true )
2022-08-08 20:47:22 (21468): Status Report: Elapsed Time: '6000.000000'
2022-08-08 20:47:22 (21468): Status Report: CPU Time: '52.718750'
2022-08-08 22:27:30 (21468): Status Report: Elapsed Time: '12000.000000'
2022-08-08 22:27:30 (21468): Status Report: CPU Time: '78.906250'
2022-08-09 00:07:39 (21468): Status Report: Elapsed Time: '18000.000000'
2022-08-09 00:07:39 (21468): Status Report: CPU Time: '106.875000'
2022-08-09 01:47:49 (21468): Status Report: Elapsed Time: '24000.000000'
2022-08-09 01:47:49 (21468): Status Report: CPU Time: '130.171875'
2022-08-09 03:27:57 (21468): Status Report: Elapsed Time: '30000.000000'
2022-08-09 03:27:57 (21468): Status Report: CPU Time: '157.125000'
2022-08-09 05:08:06 (21468): Status Report: Elapsed Time: '36000.000000'
2022-08-09 05:08:06 (21468): Status Report: CPU Time: '178.687500'
We need a Atlas-stop for this. CVMFS connect problem!
11 hour for nothing(10 CPU)!
ID: 47111 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,800
RAC: 1,930
Message 47112 - Posted: 9 Aug 2022, 7:04:39 UTC - in response to Message 47111.  

We need a Atlas-stop for this. CVMFS connect problem!
Are there other users with so many CVMFS-connect problems?
I have not so many ATLAS-tasks running, but no one failed on my side.
All CVMFS-response times here are between 3 and at the most 8 seconds.
I did not view all your results, but from your valid tasks the response times are between 3 and 81 seconds.
Maybe there is somewhere a limit (90 sec.?) to get a response, else you will never get one or is rejected by the network, because too late?
To me it seems to be a network issue on your side or CERN's side. If on CERN's side (max # connections?) more users would suffer from this.
ID: 47112 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,019,169
RAC: 136,189
Message 47113 - Posted: 9 Aug 2022, 7:28:51 UTC - in response to Message 47112.  

Might be worth to tune the TCP settings (on the heavy load workers and the computer running the Squid proxy):
https://support.solarwinds.com/SuccessCenter/s/article/NETSTAT-A-command-displays-too-many-TCP-IP-connections?language=en_US

Sections:
"Increase the maximum simultaneous connections"
"Reduce the duration of the Reserved State"
ID: 47113 · Report as offensive     Reply Quote
entity

Send message
Joined: 7 May 17
Posts: 6
Credit: 695,132
RAC: 0
Message 47115 - Posted: 9 Aug 2022, 17:54:02 UTC - in response to Message 47113.  

Started getting immediate errors with ATLAS multi-attach tasks. Prior to today they ran fine. A snippet of the output is included.

2022-08-09 12:45:41 (3632470):
Command: VBoxManage -q showhdinfo "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: 0
Output:
UUID: 6f08958e-7bfd-4804-8dd7-c7b4408cb126
Parent UUID: base
State: created
Type: multiattach
Location: /var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 20480 MBytes
Size on disk: 2645 MBytes
Encryption: disabled
Property: AllocationBlockSize=1048576
Child UUIDs: 80093e87-fcef-479f-801f-dd2cc020d954

2022-08-09 12:45:42 (3632470):
Command: VBoxManage -q storageattach "boinc_a50fb3ffb7fe0a5b" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228409
Output:
VBoxManage: error: Cannot attach medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 772 of file VBoxManageStorageController.cpp

2022-08-09 12:45:42 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:43 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:45 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:46 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:47 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp

2022-08-09 12:45:49 (3632470):
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi"
Exit Code: -2135228404
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_2.02_image.vdi' because it has 1 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1736 of file VBoxManageDisk.cpp
ID: 47115 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,019,169
RAC: 136,189
Message 47116 - Posted: 9 Aug 2022, 18:17:28 UTC - in response to Message 47115.  

Your computer list is empty.
Very unusual.

Since you wrote "...Prior to today they ran fine..." there should be a computer entry and at least 1 (more likely more) tasks sent out to that computer.
ID: 47116 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : ATLAS application : ATLAS vbox v2.02


©2024 CERN