Message boards :
ATLAS application :
atlas error "error while computing"
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
All of my "ATLAS Simulation v3.01 (vbox64_mt_mcore_atlas) windows_x86_64" tasks have failed since Aug. 18 with "error while computing". This seems different from the error(s) people were reporting at the end of June, because as recently as Aug. 16, these tasks were finishing fine. Does anyone have any idea why this suddenly started happening? Thanks. Doug |
Send message Joined: 13 Jan 24 Posts: 39 Credit: 5,950,180 RAC: 19,560 ![]() ![]() ![]() |
One of your recent tasks shows errors in VirtualBox 2025-08-20 11:33:56 (20148): Adding virtual disk drive to VM. (ATLAS_vbox_3.01_image.vdi) Look for something to clean up in VirtualBox: Open VirtualBox Manager >> File >> Tools >> Virtual Media Manager. Look for yellow triangles or anything else that looks abnormal. ![]() |
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
Hi, Thanks for your reply. I don't have a Virtual Media Manager option on that menu, but I do have "VM Activity Overview". It doesn't show much, but has an entry for each VM. There is no indication of errors in the task that is currently running, no yellow triangles or abnormalities. For my curiosity, where on the LHC BOINC site did you find that task error you copied? Thanks again. doug |
Send message Joined: 13 Jan 24 Posts: 39 Credit: 5,950,180 RAC: 19,560 ![]() ![]() ![]() |
No worries. If File >> Tools doesn't show an entry for Virtual Media Manager, try the small menu at the right side of the top entry (labeled Tools) in the right-hand column of Oracle VirtualBox Manager VM Activity Overview. There should be an entry for Media. Click that to get to the target screen. You should be able to see and clean up any problems there. If you are still having errors after that, it may be necessary to reinstall VirtualBox and/or reset the project. You are running a fairly old version of VirtualBox, 7.1.4, The current version of 7.1 is 7.1.12. You might want to upgrade to that although it is not strictly necessary. Version 7.2 is available but is a major upgrade and still quite new. On any page where you can see your task list you can click on a completed task's ID to see some log information. The path that I usually use for my own is https://lhcathome.cern.ch/lhcathome/index.php >> Login >> Project >> Account >> Tasks View To see your task log, I clicked on your name on the original message, then clicked on View next to Computers at the lower left of the resulting page. That brought up a page showing any computers that you have made visible. On that page, I clicked Tasks for a computer. The page that displays shows the available tasks run on that computer. I picked one showing an error, clicked the task ID to bring up the task information page, and looked for anything out of the ordinary. |
Send message Joined: 13 Jan 24 Posts: 39 Credit: 5,950,180 RAC: 19,560 ![]() ![]() ![]() |
Note that something has changed. The last couple of error tasks are completely different from those previous and used much more wallclock and CPU time. 2025-08-23 18:22:39 (9700): Guest Log: *** Starting ATLAS job. (PandaID=6776308225 taskID=45828917) *** It still looks like a VirtualBox problem to me. There are some reports in the forums that long pauses/suspensions cause ATLAS and CMS tasks to fail but that doesn't seem to fit your situation. |
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
Thanks again SO much for all your help and suggestions! I know my VB is older, but I have always understood that one should only upgrade VB when upgrading BOINC, to make sure they match. My BOINC is 8.2.4 (x64), and a quick check shows no newer version. Looking at my old install files, the last version of BOINC with VB I have was 8.0.2, from June 2024, so that presumably is when I got my current VB. My 8.2.4 install file, from exactly a month ago, doesn't indicate having any associated VB. Maybe I somehow got the wrong install file? Am I wrong about needing to have a match for BOINC version and VB version? It seems I've read multiple time on various BOINC forums about issues caused exactly by such a mismatch, but maybe I've misunderstood. In any case, I also can't find that Media entry on the little right-hand popup menu on the blue Tools item on the left hand side. The entries I have are: Welcome, Extensions, Cloud and Activities. Sorry to be a pain. doug |
Send message Joined: 13 Jan 24 Posts: 39 Credit: 5,950,180 RAC: 19,560 ![]() ![]() ![]() |
As far as I know BOINC stopped distributing VirtualBox. I've been using VirtualBox from https://www.virtualbox.org/ for a couple of years without any problem, updating every few months. I usually don't let it get more than one or two minor releases behind, I also install the Extension Pack, although I don't think it's needed for BOINC. The version that you are running should be adequate. It sounds like your VirtualBox preferences are set to Basic. The Media and Network tools are there if you switch to Expert. In VB Manager, File >> Preferences >> click Expert near the top of the popup window that opens, then the OK button to close the window. Then you should see the missing tools in the menus. Personally, I'd do a project reset as well so as to restart with a clean slate. You'll lose any tasks in your queue when you do a project reset, but since they are all failing anyway that shouldn't be a concern. The project support files will need to download again after a reset. The ATLAS VB image is more than 4 GB. You won't lose any credit for tasks already completed and reported. LHC is the only BOINC project that I run that uses VirtualBox now. I'm not aware of any recent requirement for version matching, although I have a vague recollection about needing to have the correct major release a few years ago. |
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
You were right that my VB Manager was in Basic mode. I switched to Expert, found the screen you originally directed me to, and found lots of yellow triangles - 4, to be exact, with filenames like: Theory_2023_12_13.vdi. Just as a test, I selected one of those and clicked the Remove button. I get a confirmation dialog, which is fine and expected, but the last sentence on that dialog is: "As this hard disk is inaccessible its image file cannot be deleted." I'm not sure, then, what the Remove button would even do. However, I think your recommendation is correct. I'm going to reset the project and start over. I've nothing to lose, and I'm definitely not going to keep running at least Atlas tasks if they are going to continue to fail. I would just like to know what happened on Aug. 18 between 13:29:52 UTC and 13:52:12 UTC that caused processing to go from all tasks succeeding to all tasks failing. That just seems bizarre. Thanks again for all your very detailed help, and I will at least let you know the results. |
Send message Joined: 13 Jan 24 Posts: 39 Credit: 5,950,180 RAC: 19,560 ![]() ![]() ![]() |
It's best to clean up all the yellow triangles. Doing so cleans up VirtualBox's internal structures. Part of the problem is that VB has been trying to access a file that no longer exists. If you continue to experience issues with VB, you might want to uninstall and reinstall it. The version to use is up to you. Any minor release of 7.1 will do. You've been using 7.1.4 and can reinstall that if you prefer. The latest is 7.1.12. I would not recommend going to version 7.2 yet. In all probability we will never know exactly what went wrong. |
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
Well... I reset the project, and checked the LHC projects folder on disk to make sure and it was empty. I cleaned up every error in the VB Manager, as you suggested, then "updated" the project in the BOINC Manager and let things run. So far, the 3 tasks that have finished have all succeeded! So, tentatively, I'm going to say you got me all fixed up! Thanks SO much! doug |
![]() Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,792,712 RAC: 74,925 ![]() ![]() |
So far all your ATLAS task logs report this: Guest Log: No HITS file was produced This means they didn't return any scientific result. Possible reasons Your computer reports 16 GB RAM and 4 cores which is the minimum recommended for ATLAS, but only if the computers runs nothing else. Your ATLAS VMs are configured to use 2 virtual CPUs, but you throttle BOINC down to 40% (=1.6 CPUs). This can cause timing issues in VirtualBox. Your logs show series of this: VM state change detected. (old = 'running', new = 'paused') VM state change detected. (old = 'paused', new = 'running') VM state change detected. (old = 'running', new = 'paused') VM state change detected. (old = 'paused', new = 'running') This may cause open network connections to time out (especially during task setup) as well as other VM issues that can't be recovered. Each pause/resume cycle requires lots of RAM or lots of time (if swap is used) to get the VM state written to disk. If you want to run ATLAS on this computer you may need to adjust your BOINC settings to allow ATLAS to run without interruption. If you still find 'No HITS file was produced' in your logs it is not recommended to run ATLAS. You may then try Theory which is singlecore and needs much less RAM. |
Send message Joined: 28 Mar 20 Posts: 33 Credit: 217,772 RAC: 132 ![]() ![]() |
ahhh - so, the result of my recent fixes are too good to be true. Alas. I do run other stuff besides BOINC on the machine sometimes, and other projects besides LHC, and other LHC applications besides Atlas. The 40% BOINC throttle and 2 CPUs limit is only when I'm also using the computer directly. If I'm not at the keyboard, then BOINC gets 100% of the CPUs and 100% of the CPU time. But apparently that is the bare minimum. I'm not willing to devote the whole machine to Atlas all the time. So it looks like I should take your advice to stop trying to run Atlas unless and until I get a more substantial machine. I'm embarrassed now to have wasted all the time and effort Glohr devoted to helping me get Atlas running again. Sorry, man. It just seems a bit weird to me that, for example, my Atlas task 425862756 ran for 73,969.14 seconds, had no errors and finished with a status of "Completed and validated", but apparently didn't do anything useful whatsoever. Seems like it would be nice to get some indication of problems in this situation. Anyway, thank you very much for all your efforts and help and explanation! |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,803,014 RAC: 10,753 ![]() ![]() ![]() |
Mine are failing with this: "VBoxManage: error: VirtualBox can't enable the AMD-V extension. Please disable the KVM kernel extension, recompile your kernel and reboot (VERR_SVM_IN_USE) VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component ConsoleWrap, interface IConsole" Any ideas? https://lhcathome.cern.ch/lhcathome/result.php?resultid=425980340 Edit I have checked Docker --version and KVM --version and both report not being installed. |
![]() Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,792,712 RAC: 74,925 ![]() ![]() |
This has already been discussed a couple of times. I suggest to run a forum search for "modprobe" and extend the time frame to 1 year. I wonder why it took nearly 1 day and >70 failed tasks to notice something is wrong. |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,803,014 RAC: 10,753 ![]() ![]() ![]() |
"I wonder why it took nearly 1 day and >70 failed tasks to notice something is wrong." For me to notice? |
Send message Joined: 3 Nov 12 Posts: 75 Credit: 170,683,089 RAC: 80,124 ![]() ![]() ![]() |
What about the TRIUMF-LCG2? They run about 90 atlas clients but no valid results for weeks. |
©2025 CERN