Message boards : Number crunching : Postponed: VM job unmanageble, restarting later ?????
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
San-Fernando-Valley

Send message
Joined: 26 Mar 16
Posts: 30
Credit: 1,245,747
RAC: 0
Message 35726 - Posted: 1 Jul 2018, 8:52:38 UTC - in response to Message 35719.  


But what used to work is not working so well now, as everything is falling apart in various degrees. I think the fact of life is that LHC is the most advanced physics project in the world, and it has the most complicated computer and network structure as a necessary part of it. Furthermore, it was not developed for the home users running BOINC, as are most BOINC projects, but was developed for the advanced computing capabilities of similar large institutions (Fermilab, etc.) around the world. They probably don't use VBox at all. So we are sort of an afterthought. It is not that they don't appreciate our efforts, but we are the tail of the dog, not the head of it.


NOW that is an acceptable answer !
Thank you for explaining the real thing !

Why does LHC bother with us at all ?

One should not confuse the complexity of physics (-projects) with the simplicity of VirtualBox.

Have a nice Sunday.
ID: 35726 · Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 26 Mar 16
Posts: 30
Credit: 1,245,747
RAC: 0
Message 35727 - Posted: 1 Jul 2018, 8:54:15 UTC - in response to Message 35723.  

On the Top500 list IBM has reached the top with its Summit computer, which makes large use of nVidia Tesla GPU boards at 200 petaflops. The second is a Chinese supercomputer which was first last time, the third is another American computer, Sierra. The Piz Daint Swiss computer which was third is now sixth, the best Italian is thirteenth. Things change rapidly in the supercomputer world.
Tullio

Thanks - but has nothing to do with this thread ...
Over and out.
ID: 35727 · Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 26 Mar 16
Posts: 30
Credit: 1,245,747
RAC: 0
Message 35728 - Posted: 1 Jul 2018, 9:00:13 UTC - in response to Message 35718.  


The best we can do as crunchers is accept that reality and work with it patiently. If you can't find the patience then you need to decide whether or not crunching is for you.

... OS as unstable and poorly designed as Windoze you're just asking for tons of trouble. I would consider formatting the drive and reinstalling everything from scratch and this time going with a real OS instead of Windoze. Then learn how to walk before you run. Turning on ALL the applications at this project is likely a mistake. Setting "unlimited cores" would be another mistake.



This is the type of answers we all love !

Seems like you have been personally insulted ...

But, just try to have a nice day !
Over and out.
ID: 35728 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35729 - Posted: 1 Jul 2018, 16:23:08 UTC - in response to Message 35728.  

Seems like you have been personally insulted ...


ROFLMAO.
Seems like you came here to rant over what you aren't talented enough to fix rather than learn how to fix it.
You have an even nicer day.
ID: 35729 · Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 26 Mar 16
Posts: 30
Credit: 1,245,747
RAC: 0
Message 35733 - Posted: 2 Jul 2018, 7:10:21 UTC - in response to Message 35729.  

Witzbold
ID: 35733 · Report as offensive     Reply Quote
Profile JaRski-S60R
Avatar

Send message
Joined: 13 Jul 05
Posts: 27
Credit: 375,133
RAC: 1
Message 36638 - Posted: 5 Sep 2018, 17:34:43 UTC - in response to Message 35527.  

Had this "unmanageble" also. Exiting Boinc and restarting to restart the LHC Atlas jobs. I've 32GB and it never exceeded 1gb for atlas 8 cores and stopped and in the end the unit failed with NO credit.
No I just marked again "keep no-gpu-jobs" in memory and sudden cpu increases to +95% (never saw this with LHC VM job before) in use while it was 10 when unmarked and not yet stopped so maybe this is it?!?!?!
.
ID: 36638 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 916
Credit: 33,704,137
RAC: 6,374
Message 36639 - Posted: 5 Sep 2018, 17:46:00 UTC - in response to Message 36638.  

196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED
ID: 36639 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 920
Credit: 39,471,289
RAC: 10,741
Message 36640 - Posted: 5 Sep 2018, 20:19:42 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=206268165

I would start by updating the version of VB https://www.virtualbox.org/wiki/Downloads

And it is a good idea to let a single task like that to run without suspending and restarting since Atlas tasks don't like doing that and we have easier to run tasks here too.
ID: 36640 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,118,253
RAC: 794
Message 36758 - Posted: 18 Sep 2018, 15:27:24 UTC

Hi,

I had the same issue on one of my computers. I was using:
- BOINC 7.10.2.
- VirtualBox 5.2.8.
- SSD for ProgramData/BOINC
I have tried various tricks to solve the problem with no success. After reading this thread I finally decided to uninstall BOINC and VirtualBox and install the following conf:
- BOINC 7.12.1.
- VirtualBox 5.2.18.
- Hard Disk for ProgramData/BOINC
I am not sure which of those changes actually fixed the problem of those unmanageable VMs, but the new install works fine now for ATLAS.
Hoping it helps.
Herve
We are the product of random evolution.
ID: 36758 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 12 Feb 14
Posts: 71
Credit: 1,789,447
RAC: 0
Message 36832 - Posted: 23 Sep 2018, 7:59:45 UTC

I have noticed that quitting BOINC, waiting for all of the tasks to cleanly shut down and get saved to disk, and then restarting BOINC fixes all of the work units affected by this problem. However, it does not prevent new work units from getting this problem.
ID: 36832 · Report as offensive     Reply Quote
David

Send message
Joined: 11 Apr 17
Posts: 38
Credit: 6,828,377
RAC: 4,147
Message 42096 - Posted: 8 Apr 2020, 15:40:28 UTC

Maybe a viewpoint could be:
If the LHC devs don't want to support us peons (in the LHC programmers grand scheme), why should we spend our money, our compute time, our electricity charges -- supporting THEM?
Volunteers, spending real money, supporting a project that doesn't support US?
Is this where we are now?
I wonder just what percentage of LHC computer needs are solved by thousands of people spending so much time and money to further science.
ID: 42096 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 916
Credit: 33,704,137
RAC: 6,374
Message 42097 - Posted: 8 Apr 2020, 15:51:51 UTC - in response to Message 42096.  

David,
You have found this Project yourself and you can leave it yourself if you want!
ID: 42097 · Report as offensive     Reply Quote
David

Send message
Joined: 11 Apr 17
Posts: 38
Credit: 6,828,377
RAC: 4,147
Message 42098 - Posted: 8 Apr 2020, 17:11:22 UTC
Last modified: 8 Apr 2020, 17:16:58 UTC

Everyone else and I are fully aware of that option, thank you very much. It is kind of you to remind us all.
But not really helpful.
ID: 42098 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Postponed: VM job unmanageble, restarting later ?????


©2020 CERN