Message boards : ATLAS application : Question: Vbox 6.1.22 vs 6.1.18 effect on tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44878 - Posted: 4 May 2021, 18:11:59 UTC

Anyone know if .22 is causing task failures vs .18 which seems stable?

I upgraded to .22 at the end of the month and did not have much success with tasks. I just downgraded to .18 and I think it will run fine now. (fingers crossed)

But why does .22 cause troubles and .18 runs ok?
Is this limited to my system or have any the rest of you had this issue?
ID: 44878 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 687
Credit: 433,607,097
RAC: 60,388
Message 44879 - Posted: 4 May 2021, 18:54:37 UTC - in response to Message 44878.  

I didn't see any change to my error rate for ATLAS
ID: 44879 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1046
Credit: 6,601,227
RAC: 256
Message 44880 - Posted: 4 May 2021, 19:31:54 UTC

I've processed ATLAS- and Theory-tasks with VBox v6.1.22 without issues. No CMS-task done yet.
ID: 44880 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44881 - Posted: 4 May 2021, 19:34:08 UTC - in response to Message 44879.  
Last modified: 4 May 2021, 19:40:22 UTC

I didn't see any change to my error rate for ATLAS

Weird...I can't figure out what my system sees different from everyone else.
I goto .22 and nothing but crashes.

Win 10 fully updated. Ryzen 2700 (not x) with tons of memory.
Run FAH in addition to a wide variety of other BOINC projects.
But none of this should affect ATLAS.
Using 1 core instead of 4 or 8.
Was running 4 but still it bugged out in .22. Ran 8 and same problem.
Ran 1 in .22 and still problems.
Now in .18, 1 core, 1:25 into it (11 seconds computation) according to BoincTasks.
Using .22% of the core capacity.
Advancing at .014% per 2 second update intervals.
Estimated 3:20 left. That would make it a nearly 5 hour task where as I used to knock them out in just over 4 hours.

Stderr text: VBoxManage -q storageattach "boinc_b0a685e70dc1fb64" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "C:\boinc data\slots\37/vm_image.vdi"
Output:
VBoxManage.exe: error: Medium 'C:\boinc data\slots\37\vm_image.vdi' is not accessible. UUID {5e2342bb-76a4-44ff-81f6-2f3283cde68f} of the medium 'C:\boinc data\slots\37\vm_image.vdi' does not match the value {d7120b97-71c4-46f7-aa18-788f76bbccdf} stored in the media registry ('C:\Users\Greg\.VirtualBox\VirtualBox.xml')
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "SetIds(fSetNewUuid, bstrNewUuid.raw(), fSetNewParentUuid, bstrNewParentUuid.raw())" at line 694 of file VBoxManageStorageController.cpp
VBoxManage.exe: error: Failed to set the medium/parent medium UUID

Notes:

Another VirtualBox management application has locked the session for
this VM. BOINC cannot properly monitor this VM
and so this job will be aborted. <<---- But yet it is still running on my system!


2021-05-04 17:14:29 (36364): Could not create VM
2021-05-04 17:14:29 (36364): ERROR: VM failed to start
2021-05-04 17:14:36 (36364):
NOTE: VM session lock error encountered.
BOINC will be notified that it needs to clean up the environment.
This might be a temporary problem and so this job will be rescheduled for another time.

(weird because this is a fresh Vbox and it shows only 1 task running and nothing else in the system)

021-05-04 20:06:32 (8856): VM state change detected. (old = 'PoweredOff', new = 'Running')
2021-05-04 20:06:32 (8856): Preference change detected
2021-05-04 20:06:32 (8856): Setting CPU throttle for VM. (100%)
2021-05-04 20:06:34 (8856): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 180 seconds) or (Vbox_job.xml: 900 seconds))


But no checkpoints
ID: 44881 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 687
Credit: 433,607,097
RAC: 60,388
Message 44882 - Posted: 4 May 2021, 19:43:42 UTC - in response to Message 44881.  

I have seen in the past after a vbox upgrade that they all bugged out until the backlog of task flushed out. I think any checked pointed task didn't like to restore the check point in a different version of vbox, you could try to clean out all the check points and/or make a full shutdown of each VM.


Could also be that there is some "zombie" VMs that cause vbox to be un-happy, you normally get an error message when you open vbox
ID: 44882 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1046
Credit: 6,601,227
RAC: 256
Message 44883 - Posted: 4 May 2021, 20:23:33 UTC - in response to Message 44881.  

Weird...I can't figure out what my system sees different from everyone else.
I goto .22 and nothing but crashes.

I see that you had some valids with v22 and also errors with v18. Is seems not related to VirtulaBox's version.
ID: 44883 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44885 - Posted: 4 May 2021, 20:49:41 UTC - in response to Message 44883.  
Last modified: 4 May 2021, 20:56:47 UTC

Weird...I can't figure out what my system sees different from everyone else.
I goto .22 and nothing but crashes.

I see that you had some valids with v22 and also errors with v18. Is seems not related to VirtulaBox's version.


So then what do you guys think is going on?
Does a version change with active tasks or queued tasks cause problems?


But look...70+ errors out of how many tasks?
A lot of these ran 10+ hrs or a day or two days depending on how busy I was to be able to check them.
Bugs in the tasks?
I only know bugs from Rosetta, not ATLAS.


The current task is chugging away at .010% per 2 seconds and now 50% with 2:47 to go. I'm hoping it does not bog down in the last hour of processing. That was another reason I was trying the upgrade. To try and beat the bog down in the last hour that slows down to .002% per 2 seconds and the CPU% goes down to something like .2%

I don't want to jinx things...but...if I want to upgrade...then what?
Set the project to no new tasks and let everything clear out and then upgrade?

One other thing..what is the difference between 4 cores on a task and a single core or 8 cores? That is something else I don't understand.
ID: 44885 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1818
Credit: 122,901,984
RAC: 76,283
Message 44886 - Posted: 5 May 2021, 5:32:53 UTC - in response to Message 44885.  

greg_be wrote:
Ryzen 2700 (not x) with tons of memory
Run FAH in addition to a wide variety of other BOINC projects.

Your computer details page shows 24 GB RAM.
That's not "tons of memory" as you configure each ATLAS VM to allocate 14 GB, even the 1 core tasks.
2021-05-04 17:07:14 (40320): Setting Memory Size for VM. (14000MB)
2021-05-04 17:07:22 (40320): Setting CPU Count for VM. (1)

The BOINC server limits the RAM setting sent along with each task to 10200 MB and your client uses that value to estimate if an additional task can be started.
You overwrite the RAM allocation for the VM (NOT for BOINC!) via app_config.xml.

Depending on the total RAM usage from all processes (not only BOINC) your computer starts to swap and becomes slower and slower.
At a certain point the whole system switches to an error handling mode which can be seen in log entries like this:
2021-05-04 17:09:51 (40320): VM is no longer is a running state. It is in 'GuruMeditation'.
2021-05-04 17:09:51 (40320): VM state change detected. (old = 'Running', new = 'GuruMeditation')



A VBox version change does not crash tasks that are downloaded but not yet started.
It may occasionally crash work in progress


Looking at the BOINC client's progress information is useless as it never shows the real progress from inside the VMs.
This has often been explained throughout the forum.
ID: 44886 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44887 - Posted: 5 May 2021, 6:36:18 UTC - in response to Message 44886.  

greg_be wrote:
Ryzen 2700 (not x) with tons of memory
Run FAH in addition to a wide variety of other BOINC projects.

Your computer details page shows 24 GB RAM.
That's not "tons of memory" as you configure each ATLAS VM to allocate 14 GB, even the 1 core tasks.
2021-05-04 17:07:14 (40320): Setting Memory Size for VM. (14000MB)
2021-05-04 17:07:22 (40320): Setting CPU Count for VM. (1)

The BOINC server limits the RAM setting sent along with each task to 10200 MB and your client uses that value to estimate if an additional task can be started.
You overwrite the RAM allocation for the VM (NOT for BOINC!) via app_config.xml.

Depending on the total RAM usage from all processes (not only BOINC) your computer starts to swap and becomes slower and slower.
At a certain point the whole system switches to an error handling mode which can be seen in log entries like this:
2021-05-04 17:09:51 (40320): VM is no longer is a running state. It is in 'GuruMeditation'.
2021-05-04 17:09:51 (40320): VM state change detected. (old = 'Running', new = 'GuruMeditation')


A VBox version change does not crash tasks that are downloaded but not yet started.
It may occasionally crash work in progress


Looking at the BOINC client's progress information is useless as it never shows the real progress from inside the VMs.
This has often been explained throughout the forum.



Well I am running only one task. So 14 from ATLAS (one task only) and and the other projects plus web usage makes only 16. That leaves 8 available if needed.
So again, if not RAM and not VM then what is going on?
ID: 44887 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44890 - Posted: 5 May 2021, 19:13:49 UTC - in response to Message 44887.  
Last modified: 5 May 2021, 19:45:16 UTC

[quote]greg_be wrote:
Ryzen 2700 (not x) with tons of memory
Run FAH in addition to a wide variety of other BOINC projects.

Your computer details page shows 24 GB RAM.
That's not "tons of memory" as you configure each ATLAS VM to allocate 14 GB, even the 1 core tasks.
2021-05-04 17:07:14 (40320): Setting Memory Size for VM. (14000MB)
2021-05-04 17:07:22 (40320): Setting CPU Count for VM. (1)

The BOINC server limits the RAM setting sent along with each task to 10200 MB and your client uses that value to estimate if an additional task can be started.
You overwrite the RAM allocation for the VM (NOT for BOINC!) via app_config.xml.

Depending on the total RAM usage from all processes (not only BOINC) your computer starts to swap and becomes slower and slower.
At a certain point the whole system switches to an error handling mode which can be seen in log entries like this:
2021-05-04 17:09:51 (40320): VM is no longer is a running state. It is in 'GuruMeditation'.
2021-05-04 17:09:51 (40320): VM state change detected. (old = 'Running', new = 'GuruMeditation')


A VBox version change does not crash tasks that are downloaded but not yet started.
It may occasionally crash work in progress


Looking at the BOINC client's progress information is useless as it never shows the real progress from inside the VMs.
This has often been explained throughout the forum.

----------------

So my conclusion then is, just eliminate app_config and leave the project to do what it needs? Seems to me that I am not helping much by using app_config.

But the cores...again...I am confused on the difference between 1,4,8 cores per task. Does allocating more cores increase the computation speed or not? Do more cores need more memory?
ID: 44890 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 687
Credit: 433,607,097
RAC: 60,388
Message 44891 - Posted: 5 May 2021, 19:19:49 UTC

For 1 CPU 10GB is tons, I think 3GB would be fine, but I use 10GB so it matches BOINCs allocation and I can't go over.
ID: 44891 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44892 - Posted: 5 May 2021, 19:46:56 UTC - in response to Message 44891.  

For 1 CPU 10GB is tons, I think 3GB would be fine, but I use 10GB so it matches BOINCs allocation and I can't go over.


But here is the same question as below, what does 1,4,8 cpu's do to the running of the task and how much memory needs to be allocated for each group of cores? That is information I don't know.
ID: 44892 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1818
Credit: 122,901,984
RAC: 76,283
Message 44893 - Posted: 5 May 2021, 20:05:20 UTC - in response to Message 44892.  

The ATLAS-RAM-formula is:
3000 MB + 900 MB * n
with "n" = the number of cores the VM allocates.

This formula didn't change for more than 2 years.

The scientific app running inside the VM will go through a setup phase on 1 core and once this is finished it starts n threads to do the event processing.
ID: 44893 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44894 - Posted: 5 May 2021, 20:21:15 UTC - in response to Message 44893.  
Last modified: 5 May 2021, 20:22:49 UTC

The ATLAS-RAM-formula is:
3000 MB + 900 MB * n
with "n" = the number of cores the VM allocates.

This formula didn't change for more than 2 years.

The scientific app running inside the VM will go through a setup phase on 1 core and once this is finished it starts n threads to do the event processing.


Thanks for that information. The extra cores, is that automatic or based on what you set on your account?
Does it speed up the process any by adding extra cores?
Based on your formula, for 4 cores it needs 15,600MB of memory?
ID: 44894 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 547
Credit: 29,882,520
RAC: 11,721
Message 44895 - Posted: 5 May 2021, 21:15:11 UTC - in response to Message 44894.  

The ATLAS-RAM-formula is:
3000 MB + 900 MB * n
with "n" = the number of cores the VM allocates.

This formula didn't change for more than 2 years.

The scientific app running inside the VM will go through a setup phase on 1 core and once this is finished it starts n threads to do the event processing.


Thanks for that information. The extra cores, is that automatic or based on what you set on your account?
Does it speed up the process any by adding extra cores?
Based on your formula, for 4 cores it needs 15,600MB of memory?

The amount of cores used is what you specify on your preferences on your account. Unless you have changed it with an app_config.xml file.
The runtime is shorter but the CPU time should be about the same than compared to single core task.
The required memory for a 4 core task is 3000 + (4 * 900) = 6600 MB.
ID: 44895 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 289
Credit: 2,042,230
RAC: 1,275
Message 44896 - Posted: 5 May 2021, 21:42:38 UTC - in response to Message 44895.  

The ATLAS-RAM-formula is:
3000 MB + 900 MB * n
with "n" = the number of cores the VM allocates.

This formula didn't change for more than 2 years.

The scientific app running inside the VM will go through a setup phase on 1 core and once this is finished it starts n threads to do the event processing.


Thanks for that information. The extra cores, is that automatic or based on what you set on your account?
Does it speed up the process any by adding extra cores?
Based on your formula, for 4 cores it needs 15,600MB of memory?

The amount of cores used is what you specify on your preferences on your account. Unless you have changed it with an app_config.xml file.
The runtime is shorter but the CPU time should be about the same than compared to single core task.
The required memory for a 4 core task is 3000 + (4 * 900) = 6600 MB.


Well ok...trying no app_config and 4 cores to start.
running .18 VBOX
fingers crossed!

Thanks everyone for your input.
ID: 44896 · Report as offensive     Reply Quote

Message boards : ATLAS application : Question: Vbox 6.1.22 vs 6.1.18 effect on tasks


©2021 CERN