Message boards : ATLAS application : "No starage device attached ..."
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47376 - Posted: 14 Oct 2022, 13:35:21 UTC - in response to Message 47374.  

There are allways 32 Tasks (6 active and 26 waiting).
Have no idea why over the day one or two tasks show this phenomenon.
ID: 47376 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 47382 - Posted: 17 Oct 2022, 17:40:42 UTC - in response to Message 47376.  

I startetd 5 ATLAS-tasks almost at the same time and got one with your phenonemon. CVMFS checking, but no response.
I revived the task https://lhcathome.cern.ch/lhcathome/result.php?resultid=366977052 by using the method mentioned in the link from my previous post.
ID: 47382 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47383 - Posted: 18 Oct 2022, 1:11:41 UTC - in response to Message 47382.  

This is a good way to correct it, Crystal.
But, for hundreds of Tasks per day for those Threadripper.
No, thank you. Have also no Squid, because of this phenonemon.
ID: 47383 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 47387 - Posted: 20 Oct 2022, 8:13:23 UTC

last night, I had another case:

ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 2 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0

the CPU was running only for 1 min 37 secs, and unfortunately I found out only this morning after a total task runtime of almost 10 hours :-(

https://lhcathome.cern.ch/lhcathome/result.php?resultid=367034222

Again, this happened on the host where a defective SSD was replaced about 1 month ago. This type of failure happens only on this host, not on any other one - but only once in while.
Hence, I am suspecting more and more that there is some problem with the new SSD.
ID: 47387 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47388 - Posted: 20 Oct 2022, 8:19:45 UTC - in response to Message 47387.  
Last modified: 20 Oct 2022, 8:23:16 UTC

Had also a task with 10 hour runtime, last night.
In -dev, there is a test, maybe this will fix it.
Computer ID 10797673
Laufzeit 10 Stunden 8 min. 47 sek.
CPU Zeit 14 sek.
ID: 47388 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 47390 - Posted: 20 Oct 2022, 10:59:20 UTC - in response to Message 47387.  

last night, I had another case:

ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 2 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0

the CPU was running only for 1 min 37 secs, and unfortunately I found out only this morning after a total task runtime of almost 10 hours :-(

https://lhcathome.cern.ch/lhcathome/result.php?resultid=367034222

Again, this happened on the host where a defective SSD was replaced about 1 month ago. This type of failure happens only on this host, not on any other one - but only once in while.
Hence, I am suspecting more and more that there is some problem with the new SSD.
same problem a few minutes ago:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=367052136

the task directly before the above cited one finished okay.
ID: 47390 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47391 - Posted: 20 Oct 2022, 12:09:08 UTC - in response to Message 47390.  

We have to wait and hope for the new version for Windows.
ID: 47391 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 47394 - Posted: 21 Oct 2022, 5:40:43 UTC - in response to Message 47391.  

this morning, I detected the same problem with a task on another host:

ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={No storage device attached to device slot 0 on port 2 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0

https://lhcathome.cern.ch/lhcathome/result.php?resultid=367058721

So, I guess I can revise my assumption that this problem has to do with a replaced SSD on the other host.

No idea though what the problem is caused by :-(
ID: 47394 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47395 - Posted: 21 Oct 2022, 6:00:25 UTC - in response to Message 47394.  

No Erich,
this is the CVMFS-Conflict and we are waiting for a solution on production.
One Threadripper crashed yesterday evening with pci-Bus Error (Lenovo).
Since 16 days problems with the new Windows 22H2, so you are not the only one with problems.
ID: 47395 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47434 - Posted: 28 Oct 2022, 1:19:39 UTC - in response to Message 47395.  
Last modified: 28 Oct 2022, 1:21:25 UTC

Laufzeit 7 Stunden 25 min. 47 sek.
CPU Zeit 31 sek.

14:13:31.742187 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
14:13:32.241380 Saving settings file "C:\ProgramData\BOINC\slots\3\boinc_b95149f276154f28\boinc_b95149f276154f28.vbox" with version "1.16-windows"
14:13:32.855578 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={d0a0163f-e254-4e5b-a1f2-011cf991c38d} aComponent={VirtualBoxWrap} aText={Could not find a registered machine named 'boinc_f42e141ac575b363'}, preserve=false aResultDetail=0
14:13:33.286633 Saving settings file "C:\Users\mae_a\.VirtualBox\VirtualBox.xml" with version "1.12-windows"
ID: 47434 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 47439 - Posted: 30 Oct 2022, 6:17:42 UTC

this morning, I discovered the same problem

VBoxManage.exe: error: Could not find a registered machine named 'boinc_f63b71f1735a8cad'

on a different host than the one where the problem occurred many times in the recent past:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=368247704

this is really annoying since the task does not stop, but keeps running ... running ... running. More than 13 hours in this case, with CPU usage about 2 minutes :-(
ID: 47439 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47440 - Posted: 30 Oct 2022, 6:41:59 UTC - in response to Message 47395.  

This is the CVMFS-Conflict and we are waiting for a solution on production.

When Atlas-Task in Windows starts and CVMFS is not ready,
you can reproduce it. (Disconnecting LAN-Cable).
ID: 47440 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47469 - Posted: 2 Nov 2022, 11:54:39 UTC - in response to Message 47440.  
Last modified: 2 Nov 2022, 11:56:54 UTC

Saw this morning after Squid 5.5 change one more of this Tasks.
Computer ID 10795955
Laufzeit 1 Stunden 8 min. 31 sek.
CPU Zeit 7 sek.
Have Squid disconnected.
ID: 47469 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : "No starage device attached ..."


©2024 CERN