Message boards : Number crunching : VM Applications Errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Ernst van Loon

Send message
Joined: 11 Feb 14
Posts: 2
Credit: 2,173,775
RAC: 0
Message 36136 - Posted: 30 Jul 2018, 17:21:09 UTC

On an 8-core i7-4770 CPU on Windows10 I have this consistent error "Communication with VM Hypervisor failed. (6 CPU's)" since a couple of weeks. I tried re-installing VirtualBox (latest stand-alone version and the bundled version of BOINC+VirtualBox) a couple of times but nothing helps...
What can I do further to perform ATLAS simulations?

Best regards, Ernst
ID: 36136 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 430
Credit: 117,525,067
RAC: 0
Message 36139 - Posted: 30 Jul 2018, 17:38:48 UTC - in response to Message 36136.  

What can I do further to perform ATLAS simulations?

Go through my checklist

and don't use VB 5.2.x, stay with latest 5.1.x (I'm on 5.1.30 and it works fine)


Supporting BOINC, a great concept !
ID: 36139 · Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 2 Jun 07
Posts: 31
Credit: 1,442,215
RAC: 476
Message 41018 - Posted: 20 Dec 2019, 3:59:44 UTC
Last modified: 20 Dec 2019, 4:08:08 UTC

I have a Dell Work Station that after enabling Virtualization I have been able to complete an Atlas WU. In my BIOS there were two settings one for CPU Virtualization and one for VM Direct I/O control. I enabled only the first setting for the CPU and not the second as I saw a Forum posting that it could cause issues with the Suspend function.

Should I have enabled both ?

On WU / Task 128721484 there were errors at the bottom (the WU validated and credit was issued)

Thank you in advance for any advice.

Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 41018 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1965
Credit: 139,322,088
RAC: 94,711
Message 41019 - Posted: 20 Dec 2019, 7:17:52 UTC - in response to Message 41018.  

Your WU succeeded.
This can be seen here:
2019-12-19 20:54:22 (13808): Guest Log: HITS file was successfully produced

This means you are using the correct BIOS setup.



Nonetheless there are some suggestions based on other logfile entries:
2019-12-19 10:36:56 (13808): Guest Log: BIOS: VirtualBox 5.2.8

Inside the VM Guest Additions 5.2.32 are used.
You may consider to either use the same version or the most resent VirtualBox version:
https://www.virtualbox.org/wiki/Downloads



2019-12-19 10:36:56 (13808): Setting CPU throttle for VM. (85%)

If this is set too low it sometimes causes WUs to fail.
You may consider to set it to a higher value.


Pausing/Resuming the VM at that high rates often results in a failed WU and should be avoided:
2019-12-19 12:30:55 (13808): VM state change detected. (old = 'Running', new = 'Paused')
2019-12-19 12:34:46 (13808): VM state change detected. (old = 'Paused', new = 'Running')
2019-12-19 12:37:54 (13808): VM state change detected. (old = 'Running', new = 'Paused')
2019-12-19 12:39:55 (13808): VM state change detected. (old = 'Paused', new = 'Running')
2019-12-19 12:42:30 (13808): VM state change detected. (old = 'Running', new = 'Paused')
ID: 41019 · Report as offensive     Reply Quote
benefique pour tous

Send message
Joined: 9 Sep 19
Posts: 32
Credit: 2,856,470
RAC: 0
Message 41205 - Posted: 8 Jan 2020, 18:31:28 UTC

I used an HP omen with RTX. i did my best to be successful
2times they get vrong
The error is occured in the latest computing second
Now this took place 3 times
Something must be wrong
I don'tknow on which computer you can solve it because it failed on 2 big I7 and on a good amd Ryzen 5
Sorry
Bhind you find the list of occured errors
Theory_2279-784485-196
applications
Theory Simulation
créé
19 Dec 2019, 17:43:43 UTC

erreurs
Trop de résultats totaux

Temps de fonctionnement
(sec)
Temps de CPU
(sec)
Crédit
Application
255962069
10536006
19 Dec 2019, 19:15:11 UTC
24 Dec 2019, 9:30:26 UTC
Erreur lors des calculs
44,307.94
19,950.05
---
Theory Simulation v300.02 (vbox64_theory)
windows_x86_64
256748413
10623629
24 Dec 2019, 9:38:14 UTC
5 Jan 2020, 1:05:58 UTC
Erreur lors des calculs
357,010.95
352,895.60
---
Theory Simulation v300.02 (vbox64_theory)
x86_64-pc-linux-gnu
257549284
10621995
4 Jan 2020, 9:38:20 UTC
8 Jan 2020, 18:19:04 UTC
Erreur lors des calculs
376,418.99
376,092.60
---
Theory Simulation v300.02 (vbox64_theory)
windows_x86_64



©2020 CERN
ID: 41205 · Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 27 Jun 06
Posts: 14
Credit: 120,460
RAC: 446
Message 43097 - Posted: 23 Jul 2020, 20:59:05 UTC - in response to Message 41205.  

I've never been able to get any VirtualBox LHC tasks to run properly.
Mostly they just idle, very slowly, using no CPU, until I abort them.
I'm using the latest version of VirtualBox (6.1.12) with the correct Extension Pack.
I just tried 2 Theory work units, and ran them for 50 minutes, while they made hardly any progress before I aborted them.
Perhaps someone can look at their Stderr outputs to see if they can diagnose what I am doing wrong?
The biggest threat to public safety and security is not terrorism, it is Government abuse of authority.
Bitcoin Donations: 1Le52kWoLz42fjfappoBmyg73oyvejKBR3
ID: 43097 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 589
Credit: 21,792,897
RAC: 8,256
Message 43098 - Posted: 23 Jul 2020, 21:31:02 UTC - in response to Message 43097.  
Last modified: 23 Jul 2020, 21:33:11 UTC

Don't know. But since you are on Ubuntu, just install CVMFS. This is a fairly foolproof procedure:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5342&postid=41861

Then, in your profile, set "Run native if available".
It usually works these days, and if it fails, it will do so faster.

NOTE: This works as-is for Theory, but native ATLAS is another matter.
You need singularity for that; best left for another discussion (or just use VBox).
ID: 43098 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1093
Credit: 6,823,647
RAC: 521
Message 43100 - Posted: 24 Jul 2020, 8:43:44 UTC - in response to Message 43097.  

Perhaps someone can look at their Stderr outputs to see if they can diagnose what I am doing wrong?
In your 2 results: Probing /cvmfs/sft.cern.ch... Failed!
It's a network issue. Maybe proxy or firewall related.
ID: 43100 · Report as offensive     Reply Quote
Phoenix
Avatar

Send message
Joined: 14 Mar 11
Posts: 9
Credit: 1,613,568
RAC: 0
Message 43126 - Posted: 29 Jul 2020, 22:10:26 UTC

I have made a couple of attempts to run jobs using virtual box
All attempts run a short time and error off
Have tried CMS, Theory and Atlas they all error off after a short time 1.5 % or so
Need some help
ID: 43126 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1507
Credit: 47,877,878
RAC: 107,887
Message 43127 - Posted: 29 Jul 2020, 22:27:10 UTC - in response to Message 43126.  

You have to enable VT-x in your BIOS of the Intel-Computer.
This Parameter is for activating the Hardware Acceleration.
ID: 43127 · Report as offensive     Reply Quote
Phoenix
Avatar

Send message
Joined: 14 Mar 11
Posts: 9
Credit: 1,613,568
RAC: 0
Message 43132 - Posted: 30 Jul 2020, 8:02:20 UTC - in response to Message 43127.  

Thank you for your help
I wanted to do some work rather than just waiting for more sixtrack
ID: 43132 · Report as offensive     Reply Quote
Cruncher Pete

Send message
Joined: 12 Oct 07
Posts: 9
Credit: 4,105,024
RAC: 0
Message 43146 - Posted: 31 Jul 2020, 6:12:23 UTC

I am also another old user that have given up on LHC because I could not set it up to run with VBox some times ago, I am disappointed that although this problem has existed for about three years as seen by your Message Box it has not been rectified.

We are in a sprint in FB Challenge, yet I can not download any other tasks bar CMS which errors out in less than a minute. I have tried all remedies that I am aware of to fix this problem at my end and since you are not issuing projects like six track that does not require VB I doubt I will ever return. Effectively, you have lost an old time cruncher that supported you for years but I can see that my 12 computers will achieve more running some other projects than yours. Sorry, that needed to be said for you did nothing to fix it and only allow work for CMS that errors out. Surely, with the capabilities of your IT Tech you can do better than this.
ID: 43146 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 589
Credit: 21,792,897
RAC: 8,256
Message 43147 - Posted: 31 Jul 2020, 6:24:43 UTC - in response to Message 43146.  

I am also another old user that have given up on LHC because I could not set it up to run with VBox some times ago, I am disappointed that although this problem has existed for about three years as seen by your Message Box it has not been rectified.

Don't use VBox version 6.x. Use 5.2.x. It has worked on every machine I have ever used it on, either Windows or Linux.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2
ID: 43147 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1965
Credit: 139,322,088
RAC: 94,711
Message 43148 - Posted: 31 Jul 2020, 8:14:46 UTC - in response to Message 43146.  

I post this comment as a normal volunteer, not as a moderator.

It is known for years that LHC's Vbox tasks, especially CMS and ATLAS require lots of RAM and put an immense pressure on disk IO as well as network traffic.
Hence, it makes no sense to connect a computer like this without ressource planning:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10660427
CPUs: 1056 (!!)
RAM: 24 GB (only!!)

It requested more than 440 CMS tasks within 1.5h.

RAM would allow to run up to 12 tasks concurrently, without any RAM left for the OS or the disk cache.
12 tasks would copy 12*2.4GB = 28.8GB vdi files from the project directory to the slots directories.
12 tasks would create >20000 internet requests and ~2.5GB downloads to finish the task setup.



A result of this overload can be seen here:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10660532
https://lhcathome.cern.ch/lhcathome/result.php?resultid=280153378
2020-07-31 16:58:04 (5352): VM Heartbeat file specified, but missing.
2020-07-31 16:58:04 (5352): VM Heartbeat file specified, but missing file system status. (errno = '2')

In addition there's a misconfiguration between Windows and VirtualBox which is not caused by LHC:
00:00:00.438450          ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={5047460a-265d-4538-b23e-ddba5fb84976} aComponent={MachineWrap} aText={The object functionality is limited}, preserve=false aResultDetail=0




Another computer has a misconfigured VT-x:
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10660554
VBoxManage.exe: error: VT-x is disabled in the BIOS for all CPU modes (VERR_VMX_MSR_ALL_VMX_DISABLED)



Volunteers running hundreds of cores and being here for more than a decade should not blame the project for homemade failures.
The main objective of LHC@home is to help the scientists rather than to satisfy any "sprint in FB Challenge".
ID: 43148 · Report as offensive     Reply Quote
crashtech

Send message
Joined: 10 May 17
Posts: 4
Credit: 6,719,627
RAC: 0
Message 43154 - Posted: 1 Aug 2020, 5:09:20 UTC - in response to Message 43148.  

The main objective of LHC@home is to help the scientists...

True, and if you believe that, then a helpful post would better serve that end, instead of scolding volunteers for having what you subjectively consider to be improper motivation. I've been having problems with two of my Linux hosts, which error out even when tasked with only two Theory tasks at a time across an 8C/16T CPU with 32GB RAM. So it's not always an overutilization problem. Also keep in mind that although you may have some sort of problem with contest participation, many of us who find fun and satisfaction with such things also return from time to time to fulfill personal long-term goals with projects like this, which certainly helps advance the science. I can't speak for the other volunteers, but I spend hundreds of dollars a month donating hardware and electricity to various projects. Again, I would think that if one really believes that helping the science is paramount, scolding participants cannot be an optimal way to achieve that goal.
ID: 43154 · Report as offensive     Reply Quote
Cruncher Pete

Send message
Joined: 12 Oct 07
Posts: 9
Credit: 4,105,024
RAC: 0
Message 43155 - Posted: 1 Aug 2020, 6:23:51 UTC - in response to Message 43154.  

Thank you Crashtech for your input. I could not have said it better myself. In deed, I was not even going to reply to the so called volunteer Moderator, Volunteer Developer and Volunteer Tester replying as a VOLUNTEER. Because of my Medical condition I am in front of my computer running BOINC as a Volunteer with no special skills for at least 12 to 14 hours a day and have been doing so since 1999, with 12 to 15 machines dedicated to Science. Yes, I make a lot of mistakes but I learn from them. I take offense to use me as an example of a person to cause overload. All I wanted to do is get some work and all I got was hundreds of CMS work that crashed within seconds. I am not an IT Tech but have sufficient experience to run BOINC project requirement and I have never had any problem with any other Projects. It hurt to read that it is my fault, yet the major problem of affiliating with VM seems to be the problem and they have not done anything to rectify it after years of reporting the problem to them on the Boards.

As a constructive criticism, I can say that I have learned a lot from his remarks and I can only ask that instead of being frustrated by so many complaints Reevaluate the projects needs. If you only wish to run CMS and none of the other sub-projects than say so in the News. I would also like to suggest that since this project requires by his own words powerful computers with lots of Hard Drive space and lots of RAM as well as plenty of Broadband availability would you mind promulgate your requirements in the Front page and tell us what is the minimum hardware that we should have as well as tell us that your project requires IT knowledge and normal everyday volunteers who are considered Push Button experts should not consider running this project. Otherwise, LHC is wasting our time and Money. I recently purchased a 64 core/128t computer that cost over $6,000. It seems it is a waste of money for I can not set up and run LHC on Win10 with 32Gb of Ram and a 1Tb of Hard Drive on a fast Broadband cable Ethernet .I installed the latest BOINC and updated VM to V6.1. I set up my preferences to receive all projects yet I only got CMS work that error-ed out almost immediately. I am sure other projects will appreciate this machine and my dedication to science.
ID: 43155 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1965
Credit: 139,322,088
RAC: 94,711
Message 43156 - Posted: 1 Aug 2020, 6:31:45 UTC - in response to Message 43154.  

I've been having problems with two of my Linux hosts, which error out even when tasked with only two Theory tasks at a time across an 8C/16T CPU with 32GB RAM.

Since your computer list doesn't show any 8C/16T CPU direct links to the failed tasks would be helpful.
ID: 43156 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1507
Credit: 47,877,878
RAC: 107,887
Message 43157 - Posted: 1 Aug 2020, 8:17:03 UTC - in response to Message 43155.  
Last modified: 1 Aug 2020, 8:18:48 UTC

It hurt to read that it is my fault, yet the major problem of affiliating with VM seems to be the problem and they have not done anything to rectify it after years of reporting the problem to them on the Boards.

No, it isn't your fault.
In the lhc@home stats you can see many Tier1 Institutes from Cern around the world.
They are running Boinc also, but with Linux.
So the concept for running work is not reduced for us volunteers only.
https://lhcathome.cern.ch/lhcathome/top_users.php
ID: 43157 · Report as offensive     Reply Quote
davidBAM

Send message
Joined: 21 Nov 18
Posts: 1
Credit: 7,708,443
RAC: 5,432
Message 43158 - Posted: 1 Aug 2020, 10:43:31 UTC
Last modified: 1 Aug 2020, 10:46:02 UTC

a) This isn't the only project which has problems when the Formula Boinc Circus comes to town.
b) This isn't the only project.

Everyone tries to do their best to advance the science and even competitive crunchers like myself have their place in the grand scheme of things.

If LHC don't want to be considered for FB Sprints, then I believe a simple email to Seb would accomplish that. If they do want to be considered for Sprints, they will naturally get a load of volunteer crunchers who don't have in-depth familiarity with how best to run the project on their hardware.

I have actually run LHC before & I do have an Honours Degree in Computer Science. In spite of both of those facts, it has taken me every waking hour since the challenge started to get even half of my machines crunching this project. They are all Ubuntu 20.04 with a minimum of 1Gb RAM per thread.

I remain totally baffled as to why both Linux and Windows users both need to use Vbox.

I have the following suggestions :
1. Either LHC should opt-out of FB sprints, OR, preferably, try to ensure it has SixTrack available for the next one
2. As has been mentioned above, LHC need to provide minimum specs and up-to-date guidance on how to run each app
3. Pay partial credits for failed work if it is not the fault of the volunteer
ID: 43158 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 157
Credit: 14,665,461
RAC: 0
Message 43159 - Posted: 1 Aug 2020, 12:17:53 UTC - in response to Message 43158.  

a) This isn't the only project which has problems when the Formula Boinc Circus comes to town.
b) This isn't the only project.
...
I have the following suggestions :
1. Either LHC should opt-out of FB sprints, OR, preferably, try to ensure it has SixTrack available for the next one

Surely it's the "sprint organiser's" responsibility to ensure they choose appropriate projects with suitable work available. Sixtrack job availability has been known to be intermittent for decades.

Touch wood, my recent CMS tasks are all running and I'm getting tasks for all sub-projects selected in my preferences. Not sure why you think you're stuck with purely CMS tasks. What happened when you unticked CMS?
ID: 43159 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : VM Applications Errors


©2022 CERN