Message boards : ATLAS application : Atlas Simulation 1.01 (Vbox64) will not finish
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38096 - Posted: 4 Mar 2019, 18:27:19 UTC

Guys,

I am on my second task now of this series and the first one I aborted with 30 seconds or less to go because it stayed stuck on this end time for more than 8hrs.

Now I have to abort #2 because it will not release. Remaining time is 00:00:00.
It has taken 1 day and 18 hrs to run since I share my system with a bunch of other projects.

I upgraded Vbox last night to Version 6.0.4 r128413 (Qt5.6.2) with the latest extension pack. I had exited the BOINC to do the upgrades and then restarted. So I have no idea what the problem is now.
I cleaned my system as well.

i am puzzled as to why theses tasks do not finish.
This is a Win 10 64bit machine with BOINC 7.4.12 x64
ID: 38096 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 38098 - Posted: 4 Mar 2019, 20:12:47 UTC - in response to Message 38096.  
Last modified: 4 Mar 2019, 20:20:14 UTC

Hi!

Since there is no detailed information regarding your aborted tasks, you should check Yeti's thread here in this forum:

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161

Did you abort all LHC tasks before updating Virtualbox?

Greetings!
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 38098 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38100 - Posted: 4 Mar 2019, 23:28:21 UTC - in response to Message 38098.  

I know Yeti's checklist.

This system has handled Atlas sims for quite some time.
Just recently though they hang up.

Tasks were suspended, BOINC client shutdown (file-shutdown), program closed.
Vbox and extensions updated, BOINC restarted.

As far as task details go, there is nothing to say.
Task was aborted by GUI is all it says.
Nothing about heartbeat or any of that stuff.


ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas)
windows_x86_64 - My machine (aborted due to stagnation)
ATLAS Simulation v2.57 (native_mt)
x86_64-pc-linux-gnu - another person with Linux (completed)
Same task: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=108559489
ID: 38100 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38101 - Posted: 4 Mar 2019, 23:32:18 UTC - in response to Message 38096.  

Yeti's list does not cover hanged/stalled work units and what causes them.
It just covers the problems of getting started with VM.

As I said before. I normally can run these work units fine, but something came up that is causing them to hang at 99.9999% of the way through. They won't terminate.
ID: 38101 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 93
Credit: 3,079,618
RAC: 2,671
Message 38102 - Posted: 5 Mar 2019, 7:11:49 UTC - in response to Message 38101.  

You would have to look at the logs for the task in the projects folder There isn't any info listed on your returned tasks except stderr output about the abort. Try looking at the virtual machine as its running to make sure progress is happening. Do you have any other Virtual Box related tasks working? The task you linked about appears to have been cancelled by the server and sent to a Linux native machine? The time reported or deadline are odd.
ID: 38102 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 38104 - Posted: 5 Mar 2019, 11:03:11 UTC - in response to Message 38101.  

My go to reply in these cases is that ATLAS cannot run along side other tasks. The tasks generally just don't like being paused or interrupted. I have never found a solution. You can mitigate the issue by increasing the time between switching tasks, but eventually some tasks will fail.

My best guess what happens is this: ATLAS tasks starts ... is interrupted or suspended ... task can't restart, but doesn't fail either ... time elapses ... server deadline is not met, hence Cancelled by server.

Sorry, but definitely let us know if you find a solution!
ID: 38104 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38130 - Posted: 6 Mar 2019, 20:08:08 UTC - in response to Message 38102.  
Last modified: 6 Mar 2019, 20:09:36 UTC

I did not look in detail at that task, but I thought the linux machine got it first.
VM tasks, yeah got a couple of projects that use it. But right now everything is standard gpu/cpu.
The other VM projects work just fine.




You would have to look at the logs for the task in the projects folder There isn't any info listed on your returned tasks except stderr output about the abort. Try looking at the virtual machine as its running to make sure progress is happening. Do you have any other Virtual Box related tasks working? The task you linked about appears to have been cancelled by the server and sent to a Linux native machine? The time reported or deadline are odd.
ID: 38130 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38131 - Posted: 6 Mar 2019, 20:11:49 UTC - in response to Message 38104.  

Moved time between projects to 2hrs.
Checkpoint has always been every minute.
It's just weird that it gets to 99.9999% and hangs.
Rebooting the computer does nothing. Exiting and restarting BOINC does nothing.
In the past when tasks hang, it is usually that VM is outdated. Not this time.
The update did nothing.


My go to reply in these cases is that ATLAS cannot run along side other tasks. The tasks generally just don't like being paused or interrupted. I have never found a solution. You can mitigate the issue by increasing the time between switching tasks, but eventually some tasks will fail.

My best guess what happens is this: ATLAS tasks starts ... is interrupted or suspended ... task can't restart, but doesn't fail either ... time elapses ... server deadline is not met, hence Cancelled by server.

Sorry, but definitely let us know if you find a solution!
ID: 38131 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38132 - Posted: 6 Mar 2019, 20:13:31 UTC

Right now it will be awhile before ATLAS comes back. My system is to busy with other work.
Oh....all projects are equal %.
Not sure if increasing resource share for ATLAS/LHC would do anything.
ID: 38132 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 38136 - Posted: 7 Mar 2019, 8:22:53 UTC - in response to Message 38132.  

Moved time between projects to 2hrs.

Not enough. You need to complete the task BEFORE switching. Depending on your configuration (I ran single threaded tasks for better efficiency), you need to allow for 8+ hours. But that only completes one task and sooner or later a task will fail due to switching.

Oh....all projects are equal %.

Again, not possible. You can limit the effects, but ultimately you need to stick to "ATLAS only" for best results. Originally I didn't want to limit myself to ATLAS, but I am glad I did. Pursuing ATLAS let down a rabbit hole that made me switch to Linux, setup a whole slew of improvements and helped me learn about how these projects are setup. I definitely recommend it.

Other observations:
We, as volunteers, cannot change the way this project run. There's a bigger scientific goal that takes precedence.
Yeti's guide is the place to start and the best source of information.
This forum contains tons of advice, although it can be hard to find.
ID: 38136 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 38138 - Posted: 7 Mar 2019, 9:54:55 UTC - in response to Message 38096.  

i am puzzled as to why theses tasks do not finish.
This is a Win 10 64bit machine with BOINC 7.4.12 x64

When you are running VBox on Windows, usually the antivirus did it.
ID: 38138 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38139 - Posted: 7 Mar 2019, 11:09:24 UTC - in response to Message 38138.  

I've seen that mentioned before. Guess I just stick with Windows defender. No extras.
ID: 38139 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 38142 - Posted: 7 Mar 2019, 13:50:32 UTC - in response to Message 38139.  

Guess I just stick with Windows defender. No extras.

That is what I do whenever I have problems. But that is on Windows 7, where it is spyware only.
Good luck.
ID: 38142 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38147 - Posted: 7 Mar 2019, 19:22:04 UTC - in response to Message 38142.  

Guess I just stick with Windows defender. No extras.

That is what I do whenever I have problems. But that is on Windows 7, where it is spyware only.
Good luck.



I had kasperky free antivirus, but removed that and just kept secure connection.
Now I have to move through all the other tasks from the other projects first and then see what happens.
ID: 38147 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 38151 - Posted: 7 Mar 2019, 21:15:03 UTC - in response to Message 38147.  

I had kasperky free antivirus, but removed that and just kept secure connection.

Kaspersky is probably the best AV, but that makes it the worst with VBox. It monitors everything, and blocks anything suspicious. And you don't necessarily get a notice about it.
Also, the exclusions don't necessarily stop the real-time monitoring, which is what causes the problems.

I am down on AVs in general, and use the most minimal ones I can.
ID: 38151 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 93
Credit: 3,079,618
RAC: 2,671
Message 38153 - Posted: 7 Mar 2019, 22:15:42 UTC - in response to Message 38151.  

greg_be, I just glanced at your listed tasks and it is odd that almost all of your tasks were marked as 'cancelled by server' . It almost looks like your tasks are gettting sent out to you, later sent to a native_mt machine, the results returned and then your are cancelled. I hope someone else may chime in. I was looking at the work unit tabs.

Link to greg_be task listing
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10556945
ID: 38153 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 38160 - Posted: 8 Mar 2019, 11:25:02 UTC

Hi greg_be;

I had the same problem with my Linux computers using Virtualbox. And i did the following steps to check the problems:
1. Check the event log for BOINC, maybe there are some clue in this file.
2. Open the VM in BOINC: Select the task running --> Click Show VM Console Button (Appears in 30 second or less) [You need Virtualbox Extension Pack]
3. In the new window, you should wait for a few minutes, until the login menu appears.
4. Press ALT+F2, and check if there are events in there, you should wait for a 20-30 minutes to verify this.

When you do this, it's probably that you obtain an error. And you will need to create a file app_config.xml to solve this problem. But first i want to make sure that you obtain an error so i reply you when you do all of this.
ID: 38160 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 38171 - Posted: 8 Mar 2019, 23:49:52 UTC

UAM, you are talking about stuff I have no idea how to do or where to find it.
VM and VM extensions are up to date. That's all I know about other than opening VM console on its own.
Never have seen anything about using BOINC to open VM.


Jonathon, weird right? I don't get it either.

Jim, took out kapersky AV and just keep secure connection.
Windows defender is the only thing now.

To all the rest a general comment: I just got 7 new tasks. Run time is set for 3hrs between tasks as I see they are supposed to complete in 3:25 roughly. VM is up to date, AV is non existent with the exception of Defender (windows)

From what I see at the moment, ATLAS is the only project using VM at the moment.
Sometimes WCG kicks stuff out with VM requirements.
All other projects are raw CPU and raw GPU usage.
ID: 38171 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38172 - Posted: 9 Mar 2019, 0:28:51 UTC - in response to Message 38171.  

To all the rest a general comment: I just got 7 new tasks. Run time is set for 3hrs between tasks as I see they are supposed to complete in 3:25 roughly.
If you got the 3:25 from the Remaining time in BOINC Manager then it's likely not very accurate. Yeah, I know, it shouldn't be that way but that's the way it is. The % complete figure is pretty much useless too. I strongly suggest boosting the switch between tasks time to 10 hours or more until you get a better idea of how long the tasks actually take, otherwise you're setting yourself up for more failed tasks.
ID: 38172 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 38173 - Posted: 9 Mar 2019, 7:51:06 UTC - in response to Message 38171.  

UAM, you are talking about stuff I have no idea how to do or where to find it.
VM and VM extensions are up to date. That's all I know about other than opening VM console on its own.
Never have seen anything about using BOINC to open VM.

You can access the Consoles and the graphics/ log-files with BOINC Manager.
Highlight a running task and press the button on the left from the column "Commands " Show VM Console or Show graphics.
ID: 38173 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : ATLAS application : Atlas Simulation 1.01 (Vbox64) will not finish


©2024 CERN