Message boards : ATLAS application : atlas error "error while computing"
Message board moderation

To post messages, you must log in.

AuthorMessage
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52103 - Posted: 21 Aug 2025, 22:40:34 UTC

All of my "ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64" tasks have failed since Aug. 18 with "error while computing". This seems different from the error(s) people were reporting at the end of June, because as recently as Aug. 16, these tasks were finishing fine.

Does anyone have any idea why this suddenly started happening?

Thanks.

Doug
ID: 52103 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 39
Credit: 5,951,056
RAC: 19,508
Message 52104 - Posted: 21 Aug 2025, 23:52:02 UTC - in response to Message 52103.  

One of your recent tasks shows errors in VirtualBox
2025-08-20 11:33:56 (20148): Adding virtual disk drive to VM. (ATLAS_vbox_3.01_image.vdi)
2025-08-20 11:34:04 (20148): Error in deregister parent vdi for VM: -2135228404
Command:
VBoxManage -q closemedium "D:\BOINC_Data\BOINC/projects/lhcathome.cern.ch_lhcathome/ATLAS_vbox_3.01_image.vdi"
Output:
VBoxManage.exe: error: Cannot close medium 'D:\BOINC_Data\BOINC\projects\lhcathome.cern.ch_lhcathome\ATLAS_vbox_3.01_image.vdi' because it has 1 child media
VBoxManage.exe: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "Close()" at line 1875 of file VBoxManageDisk.cpp

Look for something to clean up in VirtualBox: Open VirtualBox Manager >> File >> Tools >> Virtual Media Manager. Look for yellow triangles or anything else that looks abnormal.
ID: 52104 · Report as offensive     Reply Quote
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52107 - Posted: 23 Aug 2025, 23:55:11 UTC - in response to Message 52104.  

Hi,

Thanks for your reply.

I don't have a Virtual Media Manager option on that menu, but I do have "VM Activity Overview". It doesn't show much, but has an entry for each VM. There is no indication of errors in the task that is currently running, no yellow triangles or abnormalities.

For my curiosity, where on the LHC BOINC site did you find that task error you copied?

Thanks again.

doug
ID: 52107 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 39
Credit: 5,951,056
RAC: 19,508
Message 52108 - Posted: 24 Aug 2025, 8:37:49 UTC - in response to Message 52107.  

No worries.

If File >> Tools doesn't show an entry for Virtual Media Manager, try the small menu at the right side of the top entry (labeled Tools) in the right-hand column of Oracle VirtualBox Manager VM Activity Overview. There should be an entry for Media. Click that to get to the target screen. You should be able to see and clean up any problems there. If you are still having errors after that, it may be necessary to reinstall VirtualBox and/or reset the project.

You are running a fairly old version of VirtualBox, 7.1.4, The current version of 7.1 is 7.1.12. You might want to upgrade to that although it is not strictly necessary. Version 7.2 is available but is a major upgrade and still quite new.

On any page where you can see your task list you can click on a completed task's ID to see some log information. The path that I usually use for my own is https://lhcathome.cern.ch/lhcathome/index.php >> Login >> Project >> Account >> Tasks View

To see your task log, I clicked on your name on the original message, then clicked on View next to Computers at the lower left of the resulting page. That brought up a page showing any computers that you have made visible. On that page, I clicked Tasks for a computer. The page that displays shows the available tasks run on that computer. I picked one showing an error, clicked the task ID to bring up the task information page, and looked for anything out of the ordinary.
ID: 52108 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 39
Credit: 5,951,056
RAC: 19,508
Message 52109 - Posted: 24 Aug 2025, 9:05:21 UTC - in response to Message 52107.  

Note that something has changed. The last couple of error tasks are completely different from those previous and used much more wallclock and CPU time.
2025-08-23 18:22:39 (9700): Guest Log: *** Starting ATLAS job. (PandaID=6776308225 taskID=45828917) ***
2025-08-23 19:21:02 (9700): VM state change detected. (old = 'running', new = 'paused')
2025-08-23 19:38:47 (9700): VM state change detected. (old = 'paused', new = 'saving')
2025-08-23 19:38:54 (9700): Error in pause VM for VM: -182
Command:
VBoxManage -q controlvm "boinc_802ecc6542b62047" pause
Output:
VBoxManage.exe: error: Invalid machine state: Saving
VBoxManage.exe: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component ConsoleWrap, interface IConsole, callee IUnknown
VBoxManage.exe: error: Context: "Pause()" at line 388 of file VBoxManageControlVM.cpp

2025-08-23 19:39:01 (9700): Error in pause VM for VM: -182

It still looks like a VirtualBox problem to me. There are some reports in the forums that long pauses/suspensions cause ATLAS and CMS tasks to fail but that doesn't seem to fit your situation.
ID: 52109 · Report as offensive     Reply Quote
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52113 - Posted: 25 Aug 2025, 4:11:54 UTC - in response to Message 52109.  

Thanks again SO much for all your help and suggestions!

I know my VB is older, but I have always understood that one should only upgrade VB when upgrading BOINC, to make sure they match. My BOINC is 8.2.4 (x64), and a quick check shows no newer version. Looking at my old install files, the last version of BOINC with VB I have was 8.0.2, from June 2024, so that presumably is when I got my current VB. My 8.2.4 install file, from exactly a month ago, doesn't indicate having any associated VB. Maybe I somehow got the wrong install file?

Am I wrong about needing to have a match for BOINC version and VB version? It seems I've read multiple time on various BOINC forums about issues caused exactly by such a mismatch, but maybe I've misunderstood.

In any case, I also can't find that Media entry on the little right-hand popup menu on the blue Tools item on the left hand side. The entries I have are: Welcome, Extensions, Cloud and Activities.

Sorry to be a pain.

doug
ID: 52113 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 39
Credit: 5,951,056
RAC: 19,508
Message 52116 - Posted: 25 Aug 2025, 13:48:21 UTC - in response to Message 52113.  
Last modified: 25 Aug 2025, 13:51:13 UTC

As far as I know BOINC stopped distributing VirtualBox. I've been using VirtualBox from https://www.virtualbox.org/ for a couple of years without any problem, updating every few months. I usually don't let it get more than one or two minor releases behind, I also install the Extension Pack, although I don't think it's needed for BOINC. The version that you are running should be adequate.

It sounds like your VirtualBox preferences are set to Basic. The Media and Network tools are there if you switch to Expert. In VB Manager, File >> Preferences >> click Expert near the top of the popup window that opens, then the OK button to close the window. Then you should see the missing tools in the menus.

Personally, I'd do a project reset as well so as to restart with a clean slate. You'll lose any tasks in your queue when you do a project reset, but since they are all failing anyway that shouldn't be a concern. The project support files will need to download again after a reset. The ATLAS VB image is more than 4 GB. You won't lose any credit for tasks already completed and reported.

LHC is the only BOINC project that I run that uses VirtualBox now. I'm not aware of any recent requirement for version matching, although I have a vague recollection about needing to have the correct major release a few years ago.
ID: 52116 · Report as offensive     Reply Quote
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52117 - Posted: 25 Aug 2025, 15:38:00 UTC - in response to Message 52116.  

You were right that my VB Manager was in Basic mode. I switched to Expert, found the screen you originally directed me to, and found lots of yellow triangles - 4, to be exact, with filenames like: Theory_2023_12_13.vdi.

Just as a test, I selected one of those and clicked the Remove button. I get a confirmation dialog, which is fine and expected, but the last sentence on that dialog is: "As this hard disk is inaccessible its image file cannot be deleted." I'm not sure, then, what the Remove button would even do.

However, I think your recommendation is correct. I'm going to reset the project and start over. I've nothing to lose, and I'm definitely not going to keep running at least Atlas tasks if they are going to continue to fail. I would just like to know what happened on Aug. 18 between 13:29:52 UTC and 13:52:12 UTC that caused processing to go from all tasks succeeding to all tasks failing. That just seems bizarre.

Thanks again for all your very detailed help, and I will at least let you know the results.
ID: 52117 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 39
Credit: 5,951,056
RAC: 19,508
Message 52119 - Posted: 25 Aug 2025, 21:05:22 UTC - in response to Message 52117.  
Last modified: 25 Aug 2025, 21:07:04 UTC

It's best to clean up all the yellow triangles. Doing so cleans up VirtualBox's internal structures. Part of the problem is that VB has been trying to access a file that no longer exists.

If you continue to experience issues with VB, you might want to uninstall and reinstall it. The version to use is up to you. Any minor release of 7.1 will do. You've been using 7.1.4 and can reinstall that if you prefer. The latest is 7.1.12. I would not recommend going to version 7.2 yet.

In all probability we will never know exactly what went wrong.
ID: 52119 · Report as offensive     Reply Quote
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52126 - Posted: 28 Aug 2025, 2:59:28 UTC - in response to Message 52119.  

Well... I reset the project, and checked the LHC projects folder on disk to make sure and it was empty. I cleaned up every error in the VB Manager, as you suggested, then "updated" the project in the BOINC Manager and let things run. So far, the 3 tasks that have finished have all succeeded! So, tentatively, I'm going to say you got me all fixed up! Thanks SO much!

doug
ID: 52126 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2679
Credit: 286,800,357
RAC: 75,109
Message 52127 - Posted: 28 Aug 2025, 6:05:34 UTC - in response to Message 52126.  

So far all your ATLAS task logs report this:
Guest Log: No HITS file was produced

This means they didn't return any scientific result.


Possible reasons

Your computer reports 16 GB RAM and 4 cores which is the minimum recommended for ATLAS, but only if the computers runs nothing else.
Your ATLAS VMs are configured to use 2 virtual CPUs, but you throttle BOINC down to 40% (=1.6 CPUs).
This can cause timing issues in VirtualBox.

Your logs show series of this:
VM state change detected. (old = 'running', new = 'paused')
VM state change detected. (old = 'paused', new = 'running')
VM state change detected. (old = 'running', new = 'paused')
VM state change detected. (old = 'paused', new = 'running')

This may cause open network connections to time out (especially during task setup) as well as other VM issues that can't be recovered.
Each pause/resume cycle requires lots of RAM or lots of time (if swap is used) to get the VM state written to disk.


If you want to run ATLAS on this computer you may need to adjust your BOINC settings to allow ATLAS to run without interruption.
If you still find 'No HITS file was produced' in your logs it is not recommended to run ATLAS.
You may then try Theory which is singlecore and needs much less RAM.
ID: 52127 · Report as offensive     Reply Quote
doug

Send message
Joined: 28 Mar 20
Posts: 33
Credit: 217,772
RAC: 132
Message 52137 - Posted: 29 Aug 2025, 3:30:20 UTC - in response to Message 52127.  

ahhh - so, the result of my recent fixes are too good to be true. Alas.

I do run other stuff besides BOINC on the machine sometimes, and other projects besides LHC, and other LHC applications besides Atlas. The 40% BOINC throttle and 2 CPUs limit is only when I'm also using the computer directly. If I'm not at the keyboard, then BOINC gets 100% of the CPUs and 100% of the CPU time. But apparently that is the bare minimum.

I'm not willing to devote the whole machine to Atlas all the time. So it looks like I should take your advice to stop trying to run Atlas unless and until I get a more substantial machine. I'm embarrassed now to have wasted all the time and effort Glohr devoted to helping me get Atlas running again. Sorry, man.

It just seems a bit weird to me that, for example, my Atlas task 425862756 ran for 73,969.14 seconds, had no errors and finished with a status of "Completed and validated", but apparently didn't do anything useful whatsoever. Seems like it would be nice to get some indication of problems in this situation.

Anyway, thank you very much for all your efforts and help and explanation!
ID: 52137 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 124
Credit: 10,803,088
RAC: 10,663
Message 52141 - Posted: 29 Aug 2025, 9:48:56 UTC
Last modified: 29 Aug 2025, 10:40:44 UTC

Mine are failing with this:

"VBoxManage: error: VirtualBox can't enable the AMD-V extension. Please disable the KVM kernel extension, recompile your kernel and reboot (VERR_SVM_IN_USE)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component ConsoleWrap, interface IConsole"

Any ideas?

https://lhcathome.cern.ch/lhcathome/result.php?resultid=425980340

Edit I have checked

Docker --version and KVM --version and both report not being installed.
ID: 52141 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2679
Credit: 286,800,357
RAC: 75,109
Message 52144 - Posted: 29 Aug 2025, 10:46:02 UTC - in response to Message 52141.  

This has already been discussed a couple of times.
I suggest to run a forum search for "modprobe" and extend the time frame to 1 year.

I wonder why it took nearly 1 day and >70 failed tasks to notice something is wrong.
ID: 52144 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 124
Credit: 10,803,088
RAC: 10,663
Message 52147 - Posted: 29 Aug 2025, 12:04:27 UTC - in response to Message 52144.  

"I wonder why it took nearly 1 day and >70 failed tasks to notice something is wrong."

For me to notice?
ID: 52147 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 75
Credit: 170,688,265
RAC: 79,947
Message 52161 - Posted: 29 Aug 2025, 18:43:56 UTC - in response to Message 52144.  


I wonder why it took nearly 1 day and >70 failed tasks to notice something is wrong.


What about the TRIUMF-LCG2?
They run about 90 atlas clients but no valid results for weeks.
ID: 52161 · Report as offensive     Reply Quote

Message boards : ATLAS application : atlas error "error while computing"


©2025 CERN