Message boards : ATLAS application : Atlas tasks "Postponed: VM job unmanageable, restarting later."
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile dicksog

Send message
Joined: 9 Oct 11
Posts: 1
Credit: 736,195
RAC: 0
Message 34495 - Posted: 27 Feb 2018, 18:57:42 UTC

Keep getting the above issue.

Using Oracle Virtual Manager v5.2.6 r120293 and BOINC Manager v.7.8.3 wxWidgets 3.0.1

dicksog
ID: 34495 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 34496 - Posted: 27 Feb 2018, 19:31:52 UTC - in response to Message 34495.  

One task restarted with CPU usage zero, so I aborted it.
Tullio
ID: 34496 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 34498 - Posted: 27 Feb 2018, 20:36:39 UTC - in response to Message 34495.  

Keep getting the above issue.

Using Oracle Virtual Manager v5.2.6 r120293 and BOINC Manager v.7.8.3 wxWidgets 3.0.1

dicksog

Suspend all tasks, restart BOINC client (not only BOINC Manager) and resume 1 single ATLAS-task and see whether that will run.
ID: 34498 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 803
Credit: 649,968,608
RAC: 240,334
Message 34504 - Posted: 28 Feb 2018, 7:30:50 UTC
Last modified: 28 Feb 2018, 7:31:39 UTC

I just upgraded from 5.1.30 to 5.2.6 and see the same it was working well before so i can't recommend 5.2.6 for reliable operation.

I will give it another 24hr to see if it's just a kink but will downgrade if I don't see improvements.
ID: 34504 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,720,797
RAC: 14,371
Message 34505 - Posted: 28 Feb 2018, 8:50:51 UTC

I ran a few about a week ago with 5.2.6 and had no problems.
ID: 34505 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,426,967
RAC: 123,618
Message 34506 - Posted: 28 Feb 2018, 10:51:18 UTC

I guess it is not caused by a specific VirtualBox version.

A few weeks ago I noticed the same error.
It was most likely caused when my BOINC client hit the max RAM setting and as a follow up the max swap setting.
I could solve the issue by tuning the RAM and swap settings in the client configuration and the OS, RAM as high as possible, swap nearly switched off via swappiness=0.

A disadvantage of this tuning is, that it is now necessary to keep an eye on the overall RAM usage (not only BOINC).

BTW: VirtualBox 5.2.6 (linux) runs without obvious issues.
ID: 34506 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 803
Credit: 649,968,608
RAC: 240,334
Message 34511 - Posted: 28 Feb 2018, 18:09:46 UTC - in response to Message 34506.  

Maybe on Windows its different?

I have 128GB of RAM, with the setting of allow up to 98%, the swap settings are 50% at most, but I don't expect to use any swap with 128GB. I see in task manager that I still have 45GB of un-used memory and 65GB in standby.

I didn't see this issue often with the older version of VirtualBox, hence my comment.
ID: 34511 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,426,967
RAC: 123,618
Message 34512 - Posted: 28 Feb 2018, 19:23:08 UTC - in response to Message 34511.  

Maybe on Windows its different?

I have 128GB of RAM, with the setting of allow up to 98%, the swap settings are 50% at most, but I don't expect to use any swap with 128GB. I see in task manager that I still have 45GB of un-used memory and 65GB in standby.

I didn't see this issue often with the older version of VirtualBox, hence my comment.

Well, it occured while I did some test and I didn't expect to force THIS error.
But I could verify it. And yes, it may be different on windows or with different vbox versions.
ID: 34512 · Report as offensive     Reply Quote
Juha

Send message
Joined: 22 Mar 17
Posts: 30
Credit: 360,676
RAC: 0
Message 34513 - Posted: 28 Feb 2018, 20:41:30 UTC - in response to Message 34511.  

Atlas uses an older version of vboxwrapper that doesn't support VirtualBox 5.2 though VirtualBox COM API. So vboxwrapper falls back to using VBoxManage. But this old version doesn't work correctly with VBoxManage either.

Solution: If you are on Windows and want to run Atlas, stick to VirtualBox 5.1.
ID: 34513 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 803
Credit: 649,968,608
RAC: 240,334
Message 34514 - Posted: 28 Feb 2018, 20:53:50 UTC - in response to Message 34512.  

Intresting different errors on different OS's?!

I went back to 5.1 seems back to normal, I assume we have to wait for the project to move up to new wrapper.
ID: 34514 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1115
Credit: 49,720,797
RAC: 14,371
Message 34515 - Posted: 28 Feb 2018, 21:25:53 UTC

ALL of my 9 running computers are Windows 10 and 7 and running the newest versions of Boinc and VB

(and yes I also still have the XP Pro *T4T legend* able to run those X86 Sixtracks........of course not these)
Volunteer Mad Scientist For Life
ID: 34515 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34521 - Posted: 1 Mar 2018, 19:41:17 UTC - in response to Message 34506.  
Last modified: 1 Mar 2018, 20:20:22 UTC

A few weeks ago I noticed the same error.
It was most likely caused when my BOINC client hit the max RAM setting and as a follow up the max swap setting.
I could solve the issue by tuning the RAM and swap settings in the client configuration and the OS, RAM as high as possible, swap nearly switched off via swappiness=0.

I think that is probably it. I just got that error on LHCb, and could not download any more work units until I rebooted. But in the last couple of weeks I have had work units suspend themselves with a "low memory" warning, and I increased the amount of memory as much as I could using BOINC settings. That fixed it for a while. But when you get the wrong mix of jobs running, it may not be enough. I am now reducing the amount of RAM that I devote to a write cache, and that may fix it.

PS - I just ordered 32 GB more memory. That is the real fix.
ID: 34521 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,802,474
RAC: 127,446
Message 35612 - Posted: 22 Jun 2018, 8:41:50 UTC

After upgrading all PC for Boinc 7.10.2 and Virtualbox 5.2.8:
PC AMD A8 with 24 GByte (DDR3) RAM show this message (Win10pro) now since them.
When closing Boinc and deleting Boinc System Tray-process, the message come back after a while.
After a time up to one hour for Atlas this message is seen.

Will go back to 5.1.x for this PC to see if this help.

All other PC run in this constellation, BUT with allways 32 GByte (DDR3 or DDR4) RAM.
ID: 35612 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 35626 - Posted: 22 Jun 2018, 20:32:22 UTC - in response to Message 35612.  

After a time up to one hour for Atlas this message is seen.

You could lengthen BOINC-Switch-Timing to 4 hours or more.


Supporting BOINC, a great concept !
ID: 35626 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,802,474
RAC: 127,446
Message 35627 - Posted: 23 Jun 2018, 4:44:49 UTC - in response to Message 35626.  
Last modified: 23 Jun 2018, 4:47:25 UTC

Swirch-timing is 7500.
With 5.1.32 it was running the last months.
ID: 35627 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,802,474
RAC: 127,446
Message 35629 - Posted: 23 Jun 2018, 7:01:10 UTC - in response to Message 35627.  

Virtualbox 5.1.x is not longer supported from Oracle.... :-(
ID: 35629 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 35698 - Posted: 29 Jun 2018, 10:08:35 UTC

Is it just the ATLAS project that runs into this? I've removed that project from the list, but am a little wary now of allowing new tasks for the whole bucket project.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 35698 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 35859 - Posted: 11 Jul 2018, 13:16:22 UTC - in response to Message 35698.  

No reply, close to dropping the project completely now.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 35859 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,426,967
RAC: 123,618
Message 35860 - Posted: 11 Jul 2018, 13:40:24 UTC - in response to Message 35859.  

The project is running fine at the moment.
Except some download/upload problems regarding ATLAS last week, but it seems they are solved.
Just try to get fresh work.
ID: 35860 · Report as offensive     Reply Quote
Profile adrianxw

Send message
Joined: 29 Sep 04
Posts: 187
Credit: 705,487
RAC: 0
Message 35861 - Posted: 11 Jul 2018, 15:55:55 UTC

The reason I suspended the project was because I hit the "Postponed: VM job unmanageable, restarting later." problem, not uploads and downloads which can generally be ignored as they sort themselves out. I have allowed new tasks, (with ATLAS disabled), but if I see any more like that from the other sub projects, I'll dump the project.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 35861 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : ATLAS application : Atlas tasks "Postponed: VM job unmanageable, restarting later."


©2024 CERN