Message boards : Number crunching : Computation Errors
Message board moderation

To post messages, you must log in.

AuthorMessage
StarCastle

Send message
Joined: 15 Nov 09
Posts: 4
Credit: 1,559,624
RAC: 395
Message 49361 - Posted: 2 Feb 2024, 11:57:13 UTC

This has recently started happening (within the last few days).

The tasks all ends at about 25 seconds and the log shows the following in each case:

2024-02-02 6:51:06 AM | LHC@home | Output file Theory_2687-2563509-84_0_r635125659_result for task Theory_2687-2563509-84_0 absent

Any ideas what is happening?

Thanks
ID: 49361 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2122
Credit: 159,926,969
RAC: 54,657
Message 49362 - Posted: 2 Feb 2024, 12:13:55 UTC - in response to Message 49361.  
Last modified: 2 Feb 2024, 12:16:43 UTC

Please set a limit for Theory-Tasks in pref, before the reason is found.
You have successful Tasks in the last month.
Are there yellow triangle in vboxmanager?
ID: 49362 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 49364 - Posted: 2 Feb 2024, 12:17:49 UTC - in response to Message 49361.  

A similar VirtualBox error has been described here together with steps to solve it:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6079&postid=49046

Instead of the CMS vdi you need to cleanup the Theory vdi.
ID: 49364 · Report as offensive     Reply Quote
StarCastle

Send message
Joined: 15 Nov 09
Posts: 4
Credit: 1,559,624
RAC: 395
Message 49365 - Posted: 2 Feb 2024, 18:36:55 UTC - in response to Message 49364.  

Thanks for the info, will give that a try.
ID: 49365 · Report as offensive     Reply Quote
StarCastle

Send message
Joined: 15 Nov 09
Posts: 4
Credit: 1,559,624
RAC: 395
Message 49393 - Posted: 4 Feb 2024, 18:51:03 UTC - in response to Message 49362.  

I set my prefs for CMS only and the tasks download and run correctly so the issue seems to be with the Theory tasks.

I monitored the startup in Vbox and noticed that the instance would begin to start then it would fail and a cleanup would remove the files.

I did capture the vbox trace file to see what was happening and the following error pops up every time an instance tries to start:

Command: VBoxManage -q showvminfo "boinc_23c1e4f58830b1c0" --machinereadable
Exit Code: -2135228415
Output:
VBoxManage.exe: error: Could not find a registered machine named 'boinc_23c1e4f58830b1c0'
VBoxManage.exe: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee IUnknown
VBoxManage.exe: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 3139 of file VBoxManageInfo.cpp

2024-02-03 21:54:36 (12296):
Command: VBoxManage -q showhdinfo "D:\Program Data\BOINC\slots\0/vm_image.vdi"
Exit Code: -2135228412
Output:
VBoxManage.exe: error: Could not find file for the medium 'D:\Program Data\BOINC\slots\0\vm_image.vdi' (VERR_FILE_NOT_FOUND)
VBoxManage.exe: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "OpenMedium(Bstr(pszFilenameOrUuid).raw(), enmDevType, enmAccessMode, fForceNewUuidOnOpen, pMedium.asOutParam())" at line 205 of file VBoxManageDisk.cpp

2024-02-03 21:54:36 (12296):
Command: VBoxManage -q createvm --name "boinc_23c1e4f58830b1c0" --basefolder "D:\Program Data\BOINC\slots\0" --ostype "Linux26_64" --register
Exit Code: 0

And later in the log:

2024-02-03 21:54:40 (12296):
Command: VBoxManage -q storageattach "boinc_23c1e4f58830b1c0" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "D:\Program Data\BOINC/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi"
Exit Code: -2135228409
Output:
VBoxManage.exe: error: Cannot attach medium 'D:\Program Data\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2023_12_13.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee IUnknown
VBoxManage.exe: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 785 of file VBoxManageStorageController.cpp

2024-02-03 21:54:41 (12296):
Command: VBoxManage -q closemedium "D:\Program Data\BOINC/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi"
Exit Code: -2135228404
Output:
VBoxManage.exe: error: Cannot close medium 'D:\Program Data\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2023_12_13.vdi' because it has 1 child media
VBoxManage.exe: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "Close()" at line 1878 of file VBoxManageDisk.cpp

Since I have been doing Theory tasks in the past something must have changed but I have not made any changes to the host.

Ideas would be appreciated where to look.

Thanks
ID: 49393 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1289
Credit: 8,520,133
RAC: 2,378
Message 49394 - Posted: 4 Feb 2024, 20:56:53 UTC - in response to Message 49393.  

In your logs: "VBoxManage.exe: error: Cannot close medium 'D:\Program Data\BOINC\projects\lhcathome.cern.ch_lhcathome\Theory_2023_12_13.vdi' because it has 1 child media"

Use VirtualBox Manager. Go to media and remove the Theory_2023_12_13.vdi description from the list, but keep the file itself.
ID: 49394 · Report as offensive     Reply Quote
StarCastle

Send message
Joined: 15 Nov 09
Posts: 4
Credit: 1,559,624
RAC: 395
Message 49445 - Posted: 7 Feb 2024, 23:38:46 UTC - in response to Message 49394.  

Thank you for the direction.

That worked perfectly and I can now process Theory Tasks.

Something else to add to the Troubleshooting wallboard lol.
ID: 49445 · Report as offensive     Reply Quote
Gary Wyckoff

Send message
Joined: 23 Aug 21
Posts: 2
Credit: 2,509,700
RAC: 374
Message 50092 - Posted: 30 Apr 2024, 19:29:55 UTC

Every LHC task is failing at 00:00:08 with a computation error:
Tue Apr 30 15:27:54 2024 | LHC@home | Starting task CMS_860485_1714503592.613793_0
Tue Apr 30 15:28:05 2024 | LHC@home | Computation for task CMS_860485_1714503592.613793_0 finished
Any idea why this is happening?
ID: 50092 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1289
Credit: 8,520,133
RAC: 2,378
Message 50093 - Posted: 30 Apr 2024, 19:39:12 UTC - in response to Message 50092.  

Any idea why this is happening?

You have your machine(s) hidden, so we can't see the results
ID: 50093 · Report as offensive     Reply Quote
Gary Wyckoff

Send message
Joined: 23 Aug 21
Posts: 2
Credit: 2,509,700
RAC: 374
Message 50094 - Posted: 30 Apr 2024, 20:41:39 UTC - in response to Message 50093.  

Visible now?
ID: 50094 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1289
Credit: 8,520,133
RAC: 2,378
Message 50095 - Posted: 1 May 2024, 6:54:18 UTC - in response to Message 50094.  
Last modified: 1 May 2024, 6:58:51 UTC

Yeah, visible now! Thanks.
On 1 machine I see child media. That are remnants of Virtual Machines that should been cleaned by BOINC's wrapper but didn't.
On the other machine you probably started several CMS-tasks at once and there I see an error not seen before, but maybe special for Darwin OS:
2024-04-30 21:47:48 (27867): Could not set race mitigation lock.
2024-04-30 21:47:48 (27867): Lockname: '/boinc_vboxwrapper_lock_c94b628801c684e7'
2024-04-30 21:47:48 (27867): Error: 63, File name too long
2024-04-30 21:47:48 (27867): Attempts: 1

The easiest way probably is on both machines to reset LHC@home project to start clean.
After the reset don't ask tasks immediately, but remove remnants from the VM's by using VirtualBox Manager
Right from Tools you see a pinned button. Select Media and remove all LHC related media and even delete related files, when asked.
To start with a first task, set in your project preferences only Theory and only 1 job.

For the second problem it's best to start VM-tasks with 1 minute interval.
That problem is solved for Windows and Linux OS, but it looks like Darwin has a problem with longer file names there.
ID: 50095 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 50096 - Posted: 1 May 2024, 7:21:53 UTC - in response to Message 50095.  

Looks like on Darwin the lock name must not exceed 31 characters while Linux and Windows allow >250.
This can't be solved without a new vboxwrapper.
ID: 50096 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 50108 - Posted: 3 May 2024, 8:51:33 UTC - in response to Message 50096.  
Last modified: 3 May 2024, 8:51:47 UTC

New versions are available for Theory and CMS with the new wrapper.
ID: 50108 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2122
Credit: 159,926,969
RAC: 54,657
Message 50109 - Posted: 3 May 2024, 9:22:08 UTC - in response to Message 50108.  

Thank you Laurence,
for me (Win11pro) no Problems so long.
Good Work.
ID: 50109 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 50110 - Posted: 3 May 2024, 9:41:03 UTC - in response to Message 50109.  

Last change was for Apple only.
;-)
ID: 50110 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2122
Credit: 159,926,969
RAC: 54,657
Message 50111 - Posted: 3 May 2024, 11:35:07 UTC

70.30 (vbox64_mt_mcore_cms) 29 Apr 2024, 12:37:49 UTC
300.30 (vbox64_theory) 29 Apr 2024, 19:43:57 UTC
It's the sight of Virtualbox.
ID: 50111 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 50112 - Posted: 3 May 2024, 11:54:11 UTC - in response to Message 50111.  

Apps for Windows and Linux from earlier this week are running fine.
The general announcement can be found here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6149

This discussion is about Computation Errors.
The recent ones affect Apple and even the app published this morning didn't solve it.

Taken from url=https://lhcathome.cern.ch/lhcathome/apps.php:
Intel 64-bit Mac OS 10.5 or later 	300.40 (vbox64_theory) 	3 May 2024, 8:42:28 UTC

Most likely by accident it still uses a vboxwrapper not including the required patch.
ID: 50112 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 50113 - Posted: 3 May 2024, 13:19:30 UTC - in response to Message 50112.  

I have bumped the apple version again using a more recent build of vboxwrapper.
ID: 50113 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 50114 - Posted: 3 May 2024, 15:31:51 UTC - in response to Message 50113.  

At least some of the Theory tasks succeeded now:
Intel 64-bit Mac OS 10.5 or later 300.50 (vbox64_theory) 3 May 2024, 13:08:28 UTC 5 GigaFLOPS
ID: 50114 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2430
Credit: 227,718,404
RAC: 124,965
Message 50115 - Posted: 3 May 2024, 17:17:43 UTC - in response to Message 50114.  

Meanwhile CMS also receives valid results from apple hosts:
Intel 64-bit Mac OS 10.5 or later 70.50 (vbox64_mt_mcore_cms) 3 May 2024, 13:12:17 UTC 12 GigaFLOPS
ID: 50115 · Report as offensive     Reply Quote

Message boards : Number crunching : Computation Errors


©2024 CERN