Message boards :
ATLAS application :
Atlas Simulation 1.01 (Vbox64) will not finish
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
Guys, I am on my second task now of this series and the first one I aborted with 30 seconds or less to go because it stayed stuck on this end time for more than 8hrs. Now I have to abort #2 because it will not release. Remaining time is 00:00:00. It has taken 1 day and 18 hrs to run since I share my system with a bunch of other projects. I upgraded Vbox last night to Version 6.0.4 r128413 (Qt5.6.2) with the latest extension pack. I had exited the BOINC to do the upgrades and then restarted. So I have no idea what the problem is now. I cleaned my system as well. i am puzzled as to why theses tasks do not finish. This is a Win 10 64bit machine with BOINC 7.4.12 x64 |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Hi! Since there is no detailed information regarding your aborted tasks, you should check Yeti's thread here in this forum: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161 Did you abort all LHC tasks before updating Virtualbox? Greetings! Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
I know Yeti's checklist. This system has handled Atlas sims for quite some time. Just recently though they hang up. Tasks were suspended, BOINC client shutdown (file-shutdown), program closed. Vbox and extensions updated, BOINC restarted. As far as task details go, there is nothing to say. Task was aborted by GUI is all it says. Nothing about heartbeat or any of that stuff. ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 - My machine (aborted due to stagnation) ATLAS Simulation v2.57 (native_mt) x86_64-pc-linux-gnu - another person with Linux (completed) Same task: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=108559489 |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
Yeti's list does not cover hanged/stalled work units and what causes them. It just covers the problems of getting started with VM. As I said before. I normally can run these work units fine, but something came up that is causing them to hang at 99.9999% of the way through. They won't terminate. |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
You would have to look at the logs for the task in the projects folder There isn't any info listed on your returned tasks except stderr output about the abort. Try looking at the virtual machine as its running to make sure progress is happening. Do you have any other Virtual Box related tasks working? The task you linked about appears to have been cancelled by the server and sent to a Linux native machine? The time reported or deadline are odd. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
My go to reply in these cases is that ATLAS cannot run along side other tasks. The tasks generally just don't like being paused or interrupted. I have never found a solution. You can mitigate the issue by increasing the time between switching tasks, but eventually some tasks will fail. My best guess what happens is this: ATLAS tasks starts ... is interrupted or suspended ... task can't restart, but doesn't fail either ... time elapses ... server deadline is not met, hence Cancelled by server. Sorry, but definitely let us know if you find a solution! |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
I did not look in detail at that task, but I thought the linux machine got it first. VM tasks, yeah got a couple of projects that use it. But right now everything is standard gpu/cpu. The other VM projects work just fine. You would have to look at the logs for the task in the projects folder There isn't any info listed on your returned tasks except stderr output about the abort. Try looking at the virtual machine as its running to make sure progress is happening. Do you have any other Virtual Box related tasks working? The task you linked about appears to have been cancelled by the server and sent to a Linux native machine? The time reported or deadline are odd. |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
Moved time between projects to 2hrs. Checkpoint has always been every minute. It's just weird that it gets to 99.9999% and hangs. Rebooting the computer does nothing. Exiting and restarting BOINC does nothing. In the past when tasks hang, it is usually that VM is outdated. Not this time. The update did nothing. My go to reply in these cases is that ATLAS cannot run along side other tasks. The tasks generally just don't like being paused or interrupted. I have never found a solution. You can mitigate the issue by increasing the time between switching tasks, but eventually some tasks will fail. |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
Right now it will be awhile before ATLAS comes back. My system is to busy with other work. Oh....all projects are equal %. Not sure if increasing resource share for ATLAS/LHC would do anything. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
Moved time between projects to 2hrs. Not enough. You need to complete the task BEFORE switching. Depending on your configuration (I ran single threaded tasks for better efficiency), you need to allow for 8+ hours. But that only completes one task and sooner or later a task will fail due to switching. Oh....all projects are equal %. Again, not possible. You can limit the effects, but ultimately you need to stick to "ATLAS only" for best results. Originally I didn't want to limit myself to ATLAS, but I am glad I did. Pursuing ATLAS let down a rabbit hole that made me switch to Linux, setup a whole slew of improvements and helped me learn about how these projects are setup. I definitely recommend it. Other observations: We, as volunteers, cannot change the way this project run. There's a bigger scientific goal that takes precedence. Yeti's guide is the place to start and the best source of information. This forum contains tons of advice, although it can be hard to find. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
i am puzzled as to why theses tasks do not finish. When you are running VBox on Windows, usually the antivirus did it. |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
I've seen that mentioned before. Guess I just stick with Windows defender. No extras. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Guess I just stick with Windows defender. No extras. That is what I do whenever I have problems. But that is on Windows 7, where it is spyware only. Good luck. |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
Guess I just stick with Windows defender. No extras. I had kasperky free antivirus, but removed that and just kept secure connection. Now I have to move through all the other tasks from the other projects first and then see what happens. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I had kasperky free antivirus, but removed that and just kept secure connection. Kaspersky is probably the best AV, but that makes it the worst with VBox. It monitors everything, and blocks anything suspicious. And you don't necessarily get a notice about it. Also, the exclusions don't necessarily stop the real-time monitoring, which is what causes the problems. I am down on AVs in general, and use the most minimal ones I can. |
Send message Joined: 25 Sep 17 Posts: 99 Credit: 3,425,566 RAC: 0 |
greg_be, I just glanced at your listed tasks and it is odd that almost all of your tasks were marked as 'cancelled by server' . It almost looks like your tasks are gettting sent out to you, later sent to a native_mt machine, the results returned and then your are cancelled. I hope someone else may chime in. I was looking at the work unit tabs. Link to greg_be task listing https://lhcathome.cern.ch/lhcathome/results.php?hostid=10556945 |
Send message Joined: 30 Oct 18 Posts: 16 Credit: 192,743,156 RAC: 0 |
Hi greg_be; I had the same problem with my Linux computers using Virtualbox. And i did the following steps to check the problems: 1. Check the event log for BOINC, maybe there are some clue in this file. 2. Open the VM in BOINC: Select the task running --> Click Show VM Console Button (Appears in 30 second or less) [You need Virtualbox Extension Pack] 3. In the new window, you should wait for a few minutes, until the login menu appears. 4. Press ALT+F2, and check if there are events in there, you should wait for a 20-30 minutes to verify this. When you do this, it's probably that you obtain an error. And you will need to create a file app_config.xml to solve this problem. But first i want to make sure that you obtain an error so i reply you when you do all of this. |
Send message Joined: 28 Dec 08 Posts: 341 Credit: 4,865,275 RAC: 71 |
UAM, you are talking about stuff I have no idea how to do or where to find it. VM and VM extensions are up to date. That's all I know about other than opening VM console on its own. Never have seen anything about using BOINC to open VM. Jonathon, weird right? I don't get it either. Jim, took out kapersky AV and just keep secure connection. Windows defender is the only thing now. To all the rest a general comment: I just got 7 new tasks. Run time is set for 3hrs between tasks as I see they are supposed to complete in 3:25 roughly. VM is up to date, AV is non existent with the exception of Defender (windows) From what I see at the moment, ATLAS is the only project using VM at the moment. Sometimes WCG kicks stuff out with VM requirements. All other projects are raw CPU and raw GPU usage. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
To all the rest a general comment: I just got 7 new tasks. Run time is set for 3hrs between tasks as I see they are supposed to complete in 3:25 roughly.If you got the 3:25 from the Remaining time in BOINC Manager then it's likely not very accurate. Yeah, I know, it shouldn't be that way but that's the way it is. The % complete figure is pretty much useless too. I strongly suggest boosting the switch between tasks time to 10 hours or more until you get a better idea of how long the tasks actually take, otherwise you're setting yourself up for more failed tasks. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 699 |
UAM, you are talking about stuff I have no idea how to do or where to find it. You can access the Consoles and the graphics/ log-files with BOINC Manager. Highlight a running task and press the button on the left from the column "Commands " Show VM Console or Show graphics. |
©2025 CERN