Message boards : Theory Application : after RAM upgrade 8 > 16GB only 2 tasks can be run, before 3
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45909 - Posted: 21 Dec 2021, 10:01:33 UTC
Last modified: 21 Dec 2021, 10:35:19 UTC

One one of my notebooks with a 2+2(HT)core CPU I made a RAM upgrade from 8 to 16GB.
Before, I had run 3 Theory tasks concurrently without any problem. After the RAM upgrade, only 2 Theory tasks can be run. Whenever a third one is downloaded and started, after a few seconds the BOINC manager in the "status" column shows "postponed: VM environment needed to be cleaned up", and in the Oracle VB Manager the task is not shown as "running" (like the other two ones), but as "Powered Off".

What's going wrong?

Edit: RAM test passed okay.
ID: 45909 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,618,962
RAC: 141,585
Message 45910 - Posted: 21 Dec 2021, 10:47:22 UTC - in response to Message 45909.  

Theory would also running with 8 GByte RAM and three tasks.
Have you Hyperthreading on? When you have a two Core CPU it can be a management problem.
This see you also on other Projects (for example Cosmology@Home).
Hardware and more RAM is often no solution to get a project running more Tasks.
Needing finetuning.
ID: 45910 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45912 - Posted: 21 Dec 2021, 11:05:47 UTC

Maeax, thanks for your hints.

Yes, hyperthreading is on (as it was before the RAM upgrade).
I know that 3 Theory tasks can be run fine with 8GB (as I have it on several other machines - in fact, on 3 other machines [with 6-core CPUs] even 5 Theory tasks are running fine with 8GB RAM).
As said above, before I made the RAM upgrade, 3 tasks ran without any problems. So I am surprised that after the upgrade, this is no longer the case.
BTW, the reason for the upgrade was that I plan to run tasks with require more RAM (like ATLAS).

What I see happening just now: a third task got startet in the BOINC manager, under "Status" it shows "active", but the Windows Task Manager shows that this task is not using any CPU.
And the Oracle VM Manager shows it as "Powered Off".
In the BOINC Manager, when pushing the "Properties" tab at the left hand side, it shows that CPU use was only 4 seconds.
Now, after about 12 minutes "active", the task aborts itselfs with "Computation error".

After this, another task was downloaded and startet, and after a few seconds again it went into status "Postponed:b" (whatever this exactly means).
ID: 45912 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45914 - Posted: 21 Dec 2021, 11:21:48 UTC - in response to Message 45912.  

so, now while 2 Theory tasks were running, I downloaded a CMS task to run it as 3rd task.
Same problem: after a few seconds, the task went into "Postponed" status, and in the Oracle VB Manager it went to "Powered Off".

So, as it looks, after the RAM upgrade from 8 to 16GB, only a maximum of 2 VM tasks can be run simultaneously (whereas before, with 8GB, I ran 3 tasks)
ID: 45914 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,618,962
RAC: 141,585
Message 45915 - Posted: 21 Dec 2021, 11:50:01 UTC - in response to Message 45914.  

Please, set the Link for your PC here, you have more than one.
ID: 45915 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45916 - Posted: 21 Dec 2021, 12:18:17 UTC - in response to Message 45915.  

Please, set the Link for your PC here, you have more than one.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10748503
ID: 45916 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,113,716
RAC: 127,714
Message 45917 - Posted: 21 Dec 2021, 12:28:48 UTC - in response to Message 45909.  

Sometimes a BIOS needs a kick...

Reboot and enter the BIOS.
Even if the new RAM size is correctly printed you may change an arbitrary (less important) BIOS value and safe the new configuration.
Reboot, enter the BIOS again and restore the previous value.
Save the BIOS settings and reboot.


Before you start BOINC, clean the slots as mentioned in your stderr.txt logfiles.
ID: 45917 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45918 - Posted: 21 Dec 2021, 15:16:18 UTC - in response to Message 45917.  

@computezrmle, thanks for the hints.

I was lucky to find a brandnew BIOS update on the HP Homepage, so I installed it successfully.
However, the problem still exists :-(
I downloaded and startet 3 tasks, and one jumped into "Postponed..." after a few seconds (of course, I did to this: "Before you start BOINC, clean the slots as mentioned in your stderr.txt logfiles.")

No idea what's going wrong.
ID: 45918 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,618,962
RAC: 141,585
Message 45919 - Posted: 21 Dec 2021, 15:49:04 UTC - in response to Message 45918.  

2021-12-21 12:23:51 (6792): Adding virtual disk drive to VM. (vm_image.vdi)
2021-12-21 12:24:24 (6792): Error in storage attach (fixed disk) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_4f0795bda846fed2" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "C:\ProgramData\BOINC\slots\0/vm_image.vdi"
Do you have a HDD, SSD or RAMdisk?
Hardware-Controller problem?
ID: 45919 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45920 - Posted: 21 Dec 2021, 16:01:54 UTC - in response to Message 45919.  

M2-SSD
ID: 45920 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45921 - Posted: 21 Dec 2021, 16:57:54 UTC - in response to Message 45919.  

2021-12-21 12:23:51 (6792): Adding virtual disk drive to VM. (vm_image.vdi)
2021-12-21 12:24:24 (6792): Error in storage attach (fixed disk) for VM: -2135228409
Command:
VBoxManage -q storageattach "boinc_4f0795bda846fed2" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "C:\ProgramData\BOINC\slots\0/vm_image.vdi"
interestingly enough, this problems always comes up in slot 0. When I open the monitor window for such a task in the VB Manager, it says "FATAL: Could not read from the boot medium! System halted".
ID: 45921 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45922 - Posted: 21 Dec 2021, 17:52:15 UTC - in response to Message 45921.  

half an hour ago, I launched a 3-core ATLAS task, and so far it works fine.
ID: 45922 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45923 - Posted: 22 Dec 2021, 7:50:38 UTC - in response to Message 45922.  

so, since there are no new suggestions as to what else I could do or try, I may consider to reduce the system back to 8 GB RAM in order to be able to run more than just 2 VB tasks concurrently.
Too bad that the additional 8 GB RAM turned out being a wrong investment :-(
ID: 45923 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,618,962
RAC: 141,585
Message 45924 - Posted: 22 Dec 2021, 8:08:29 UTC - in response to Message 45923.  

cpu-z is a standard tool on all of my PC's.
You can see your Hardware-Info's including RAM-Infos, MHz of your RAM etc.
Do you have the same RAM for both?
ID: 45924 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45927 - Posted: 22 Dec 2021, 11:34:00 UTC - in response to Message 45924.  

cpu-z is a standard tool on all of my PC's.
You can see your Hardware-Info's including RAM-Infos, MHz of your RAM etc.
Do you have the same RAM for both?
Thanks for the hint concerning cpu-z, which also I have been using once in a while over the years.
I now downloaded the latest version and installed it.
The SPD section shows the following info for the two RAMs:

Slot#1: DDR4-2134(1067MHz), SK Hynix, Ranks: Dual
Slot#2: DDR4-2132(1066MHz), Samsung, Ranks: Single

So could the difference in the ranks (single vs. dual) be the cause for my problem? Almost unbelievable, as everything else works fine. As mentioned, I ran a 3-core ATLAS task last night which used up more than half of the 16GB RAM, and it finished successfully.
Also, the Memtest which I ran yesterday did not show any problem.
ID: 45927 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,618,962
RAC: 141,585
Message 45930 - Posted: 22 Dec 2021, 11:52:19 UTC - in response to Message 45927.  

RAM is special. When you have two with different MHz, the lowest is using after booting.
Dual and single, don't know, but thinking can make problems.
You can change the place on the board between both.
But, no idea, good luck.
ID: 45930 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45931 - Posted: 22 Dec 2021, 14:23:00 UTC - in response to Message 45930.  

You can change the place on the board between both.
But, no idea, good luck.
I swapped the RAMs, as suggested by you; but: no luck. The problem still persists.
So I contacted the dealer who made the upgrade and told him that we need a second dual-ranks RAM instead of the single-ranks. He will provide one. Hope that this will indeed be the solution to my problem.
ID: 45931 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45935 - Posted: 23 Dec 2021, 6:33:55 UTC

This morning, I removed the "single ranks" RAM thus restoring the hardware status from before when I could run more than 2 Theory tasks concurrently.
I downloaded 3 Theory tasks, and 47 seconds after they started, again one went into "postponed" status.

Command:
VBoxManage -q storageattach "boinc_33f45aec8b71295a" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --setuuid "" --medium "C:\ProgramData\BOINC\slots\0/vm_image.vdi"
Output:
VBoxManage.exe: error: Medium 'C:\ProgramData\BOINC\slots\0\vm_image.vdi' is not accessible. UUID {c7cbbeeb-c984-467e-9b6e-0d2e670bed58} of the medium 'C:\ProgramData\BOINC\slots\0\vm_image.vdi' does not match the value {df719e14-8166-42a8-9c04-ae60f12052a3} stored in the media registry ('C:\Users\Erich\.VirtualBox\VirtualBox.xml')
VBoxManage.exe: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component MediumWrap, interface IMedium, callee IUnknown
VBoxManage.exe: error: Context: "SetIds(fSetNewUuid, bstrNewUuid.raw(), fSetNewParentUuid, bstrNewParentUuid.raw())" at line 694 of file VBoxManageStorageController.cpp
VBoxManage.exe: error: Failed to set the medium/parent medium UUID

Notes:
Another VirtualBox management application has locked the session for
this VM. BOINC cannot properly monitor this VM
and so this job will be aborted.


complete stderr can be seen here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=337661971

So, it seems that this problem now has no connex to the number of RAMs installed.
On the other had, the problem ocurred only after the additonal 8GB RAM was installed. But it now persists even after this additional RAM was removed.

Can any of the experts here decipher from the stderr what the problem really is? (what catches my eye is that the problem always comes up in slot 0 - whatever this may mean or not)
ID: 45935 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,113,716
RAC: 127,714
Message 45936 - Posted: 23 Dec 2021, 7:56:07 UTC - in response to Message 45935.  

It looks like VirtualBox has crashed some time in the past and there's still some trash in it's configuration files, especially regaring the registered disk images.

You may
- stop all VMs (BOINC and Non-BOINC)
- stop BOINC
- as requested in stderr.txt -> clean the vbox environment (here: remove everything below \slots\0)
- open the VirtualBox Manager and call the Virtual Media Manager
- disconnect and remove all entries not required any more
Be careful not to remove a disk connected to a running BOINC task or connected to a VM you created yourself!


A reboot shouldn't be a must but it wouldn't hurt.
Then restart BOINC and resume the tasks (best would be: not all of them concurrently).
ID: 45936 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,567,635
RAC: 119,765
Message 45938 - Posted: 23 Dec 2021, 8:55:32 UTC - in response to Message 45936.  

It looks like VirtualBox has crashed some time in the past and there's still some trash in it's configuration files, especially regaring the registered disk images.
I'll be off for a few hours and will proceed as suggested when I come back.
A quick check of the current situation though has shown:
- no slot 0 there presently (I may have deleted it yesterday when I found out that remnants of a failed task was still contained in the slot0 folder)
- I opened the Media Manager in the VB Manager and saw 3 lines:

> vm_image.vdi with preceeding acclamation mark. putting the cursor on it shows that this refers to the image vdi in slot 0 which is not existing at the moment (no file size indicated - which is logical).

> vdi_image.vdi 781MB for slot 1
> vdi_image.vdi 781MB for slot 2
these are obviously the entries for the two Theory tasks which are currently running in slot 1 and slot 2.

One more thing I forgot to mentioned in my previous posting:
Yesterday, I removed VM 6.1.26 and installed VM 6.1.28. I was hoping that this might solve the problem. It did NOT.
But I tried to run a 3-core ATLAS task - such one ran 1 day before without problems. Yesterday though it also failed with the "postponed" message. Then I remembered that somewhere in the ATLAS section of the Forum it was mentioned that the wrapper of the newest VB version has a problem with ATLAS (or vice-versa); so I removed 6.1.28 and re-installed 6.1.26.
At the end, my hope was that since I was reinstalling the VB from scratch, any problems related to the VB should no longer exist. But my assumption was wrong.

At any rate, when I will be back in a few hours, I will follow the steps as suggested above.
ID: 45938 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : after RAM upgrade 8 > 16GB only 2 tasks can be run, before 3


©2024 CERN