Message boards : ATLAS application : Repeated computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
keputnam

Send message
Joined: 27 Sep 04
Posts: 102
Credit: 7,086,947
RAC: 1,340
Message 45787 - Posted: 6 Dec 2021, 0:19:16 UTC

Got a new computer recently, and for a few days, things were great

Then I started getting Computation Errors on every job I run

Last time this happened the consensus was that I had lousy internet (failure to connect to the LHC servers)



I have since upgraded my connection from 25MB to 150MB
(I can post results from several speed test sites if it will help)

Running BOINC 7.16.11
VBox 6.1.26 (+ Extension Pack)



Any assistance gratefully accepted
ID: 45787 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 45788 - Posted: 6 Dec 2021, 0:55:48 UTC - in response to Message 45787.  
Last modified: 6 Dec 2021, 0:56:19 UTC

It looks like you are using drive E: for BOINC, at least the data directory.
I have not tried that, but I expect that BOINC and especially VirtualBox are best located on the OS drive.
ID: 45788 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 27 Sep 04
Posts: 102
Credit: 7,086,947
RAC: 1,340
Message 45789 - Posted: 6 Dec 2021, 1:05:35 UTC - in response to Message 45788.  

I have used E: as my BOINC Data Drive for over 8 years Started when my system HD was a little undersized No problems at all

BOINC and VBOX executables and VBOX VMs are all on C:
ID: 45789 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,130,430
RAC: 104,897
Message 45790 - Posted: 6 Dec 2021, 5:06:11 UTC

Folder for Boinc-Data on a other drive is no problem.
For me, Program folder and Data folder are on the same drive.
Saw this: 00:00:01.037605 ExtPack: Created cloud provider 'OCI' (hrc=ERROR_SUCCESS)
Have no idea. Network is normaly no problem.
ID: 45790 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,948,605
RAC: 137,172
Message 45791 - Posted: 6 Dec 2021, 9:44:02 UTC - in response to Message 45789.  

There may be a couple of issues.


1. driver version mismatch

Your logfiles mention this a couple of times:
00:00:00.793696          Support driver version mismatch: DriverVersion=0x290001 ClientVersion=0x300000 rc=VERR_VERSION_MISMATCH

It usually points out that VirtualBox is not correctly installed.
You may have used version A and upgraded to version B but it looks like there are files/keys from version A left on your computer.
The extension pack version has also to be in sync with the main version.

You may completely remove VirtualBox and clean all related keys.
Then do a fresh install and reboot.


2. Pointers to Oracle Cloud (OCI)

BOINC (more precise: vboxwrapper) runs the VMs locally.
It does not configure any OCI related options.
Nonetheless, your logfiles do show OCI related entries (although incomplete!):
00:00:00.726827            server 1: 68.105.28.11
00:00:00.726828            server 2: 68.105.29.11
00:00:00.726828            server 3: 68.105.28.12
00:00:00.726829            no domain set
00:00:00.726829            no search string entries
00:00:00.793696          Support driver version mismatch: DriverVersion=0x290001 ClientVersion=0x300000 rc=VERR_VERSION_MISMATCH
00:00:00.851187          VD: VDInit finished with VINF_SUCCESS
00:00:00.866853          OCI: Local config file 'C:\Users\xxx\.VirtualBox\oci_config' does not exist
00:00:00.867039          OCI: Original config file 'C:\Users\xxx\.oci\config' does not exist
00:00:00.867040          OCI: Reading profiles finished with status ERROR_SUCCESS
00:00:00.867057          ExtPack: Created cloud provider 'OCI' (hrc=ERROR_SUCCESS)

Did you fiddle around with any OCI settings?
If you need it you should exactly know what to do, otherwise leave this options untouched.
See:
https://www.virtualbox.org/manual/ch01.html#cloud-integration



3. VM tweaking

Some lines from your log:
2021-12-05 15:55:12 (17108): Setting Memory Size for VM. (8096MB)
2021-12-05 15:55:13 (17108): Setting CPU Count for VM. (5)

This VM was configured to use 5 cores.
The standard RAM setting would then be 7500 MB.
There's no need to configure more than that.
In the past ATLAS VMs running in a 1-core setup (3900 MB) sometimes suffered from not enough RAM while the EVNT file was extracted.
This has never be seen with VMs using 3 or more cores.


4. Using drive "E:"

As long as the disk (whatever technology) as a whole is fast enough to process all I/O requests without timeouts it doesn't matter where the data is written to.
IIRC in the past there were other BOINC projects expecting everything to be on just 1 filesystem but this is not an issue here.
ID: 45791 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,130,430
RAC: 104,897
Message 45792 - Posted: 6 Dec 2021, 16:16:09 UTC - in response to Message 45791.  

IIRC in the past there were other BOINC projects expecting everything to be on just 1 filesystem but this is not an issue here.

Do you have tested it out for Win 7,8,8.1,10 and 11?
ID: 45792 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 45793 - Posted: 6 Dec 2021, 16:36:23 UTC

The drive may not matter, but if you don't install/uninstall correctly, it could go to the wrong drive.
It is simpler on a single drive.
ID: 45793 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,948,605
RAC: 137,172
Message 45794 - Posted: 6 Dec 2021, 16:55:00 UTC - in response to Message 45792.  

Found an older post (3 years ago) where I mentioned primegrid being the project that did not work when the slots folder was on a different filesystem:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4855&postid=37068
As far as I remember this was a hardwired primegrid issue and explained in their forum.

LHC@home works fine for years with slots being mounted on a tmpfs filesystem.
I didn't test Windows but I know that other volunteers run it with slots being on a ramdisk - also for years.

From the OS perspective tmpfs/ramdisk are separate filesystems.
Hence, for LHC@home it's not a basic issue.
Just ensure there's enough physical RAM available.


Jim1348 wrote:
It is simpler on a single drive.

+1
The golden rule: KISS (Keep It Simple and Stupid)
ID: 45794 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,389,079
RAC: 102,169
Message 45795 - Posted: 6 Dec 2021, 17:27:20 UTC - in response to Message 45794.  

Found an older post (3 years ago) where I mentioned primegrid being the project that did not work when the slots folder was on a different filesystem:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4855&postid=37068
As far as I remember this was a hardwired primegrid issue and explained in their forum.

LHC@home works fine for years with slots being mounted on a tmpfs filesystem.
I didn't test Windows but I know that other volunteers run it with slots being on a ramdisk - also for years.
interesting to read.
I remember when years ago, on one of my systems I installed a RAMdisk and put the slots folder there, it did not work. Windows, not Linux.
ID: 45795 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,130,430
RAC: 104,897
Message 45796 - Posted: 6 Dec 2021, 17:49:18 UTC - in response to Message 45795.  

+1
ID: 45796 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,948,605
RAC: 137,172
Message 45797 - Posted: 6 Dec 2021, 19:17:32 UTC - in response to Message 45795.  

... years ago, on one of my systems I installed a RAMdisk and put the slots folder there, it did not work. Windows, not Linux.

You may try this tool:
https://sourceforge.net/projects/imdisk-toolkit/
it supports:
- dynamic ramdisk size
- use of a folder (e.g. somewhere\slots) instead of a drive letter as mountpoint
- automatic syncing (backup/restore) during shutdown/reboot
ID: 45797 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 27 Sep 04
Posts: 102
Credit: 7,086,947
RAC: 1,340
Message 45798 - Posted: 6 Dec 2021, 19:20:47 UTC

Thanks for the responses

No, I have not done any tweaking of VBox

I have uninstalled VBox, used ADAware uninstall cleanup, booted reinstalled VBox and the extension pack and rebooted again

Now waiting for the job queue to clear out to where the Scheduler requests another ATLAS job
ID: 45798 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 27 Sep 04
Posts: 102
Credit: 7,086,947
RAC: 1,340
Message 45799 - Posted: 6 Dec 2021, 21:35:22 UTC

Thanks, guys

The uninstall/re-install seems to have cured the problem

I'm about 10 minutes from completing a WU


Thinking back, I did upgrade VBox to 6.1.30 Apparently it didn't clean up the previously installed version very well
ID: 45799 · Report as offensive     Reply Quote

Message boards : ATLAS application : Repeated computation errors


©2024 CERN