Message boards : Number crunching : My idle thoughts on my idle cores...is this correct?
Message board moderation

To post messages, you must log in.

AuthorMessage
lazlo_vii
Avatar

Send message
Joined: 20 Nov 19
Posts: 21
Credit: 1,074,330
RAC: 0
Message 40788 - Posted: 5 Dec 2019, 6:57:55 UTC

"I know something about most things, I know most things a few some things, and I know everything about nothing." I don't remember where that saying came from, but it has stuck with me for many years. If you have any thoughts or knowledge that could enlighten me please post them.

My two main systems have now been upgraded to Ryzen 3700X CPUs and are both running LHC@Home with 12 of 16 threads possible threads. I set it this way so that I don't have to stop Boinc when I want to play a game on my desktop, or run a VM on my server, or for what ever other reason I might want to use the idle parts of my CPUs. I know that there are two somewhat conflicting management schemes at play. The first is on the hardware level where the BIOS and CPU are juggling threads between CPUs for thermal management reasons (You can see this in action just by watching htop for a minute). The second is the Linux kernel of the host system trying to keep all of the processes that use the same data close together to avoid L2 cache misses. This is important because the kernels in the virtual environments of the different work units will only see the virtual CPUs and not the fact that they are being bounced around the physical CPU for thermal management reasons.

Because of the first management scheme, this means that I can never have truly idle cores without using a program to set the CPU affinity of Boinc. Because of the second management scheme if I did use a CPU affinity utility the host kernel would have a much easier time avoiding cache misses and also reduce the chances that my own activities on the system will interfere with Boinc. This would come at the cost of decreased thermal efficiency. Since I have set the BIOS on these two systems to use AMD's Eco Mode it is a risk that I think has been mitigated. So I am thinking of installing and using a utility to manage the CPU affinity of Boinc.

I know that the kernel sees threads strictly by their PID number. Also the PID number changes every time a new work unit is started or when a new thread is spawned in a VM. If I don't want to manage the constantly changing PIDs then I think the best solution would be to find a utility that allows me to set CPU affinity by the name of the user that owns the process on the host system (boinc). This could be done by taking advantage of cgroups.

If all of the above is correct then I what I need to do is find the right utility and learn how AMD numbers the physical and logical cores in the CPU. After that I can deduce (by trial and error) which affinity settings give the most consistent performance and thermal results. If all goes well I could learn something new and have slightly more productive systems. If it goes bad I could generate a lot of failed work units.

So, do you think it would be worth effort?
ID: 40788 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,457,129
RAC: 123,712
Message 40793 - Posted: 5 Dec 2019, 9:19:52 UTC - in response to Message 40788.  

... there are two ... management schemes at play. The first is on the hardware level where the BIOS and CPU are juggling threads between CPUs for thermal management reasons

Wrong.
BIOS/CPU never work on thread/process control level.
On modern CPUs like your Ryzen they do a temperature monitoring as well as an overall load monitoring that controls the core frequencies (boost/non boost; XFR ...) and stop the CPU if it gets to hot to avoid damages.

(You can see this in action just by watching htop for a minute)

Partly wrong.
Htop shows that arbitrary cores are sometimes idle and sometimes under load, but it displays an average value and is not related to the BIOS/CPU temperature monitoring.

Core assignment (affinity as well as CPU time) is done by the linux scheduler (BTW: similar on windows) and it takes lots of parameters into account.
Under normal circumstances (BOINC) it is not useful to set a specific CPU affinity.
Just let the linux scheduler do what it is made for.


I know that the kernel sees threads strictly by their PID number. Also the PID number changes every time a new work unit is started or when a new thread is spawned in a VM. If I don't want to manage the constantly changing PIDs then I think the best solution would be to find a utility that allows me to set CPU affinity by the name of the user that owns the process on the host system (boinc). This could be done by taking advantage of cgroups.

This points into the right direction, except CPU affinity.

Most modern linux distributions organize their cgroups in a hierarchy that includes childgroups like "system" or "user".
system: holds processes like device drivers
user: holds processes started by normal users

Each cgroup(-tree) can be assigned a couple of values that control cpu-quota (cpu.cfs_quota_us), i/o-priority, timeslice-duration (cpu.cfs_period_us), etc.

How to use this (just a short draft!)?
Create a cgroup subtree "boinc" beside "system", "user" and set a low cpu.cfs_quota_us
system: 1024 (this is the default)
user: 1024 (this is the default)
boinc: 128

The values define a relative weight that engages if the computer (CPU) reaches full load.
The sum of all values represent 100% CPU resources, hence all system processes and all user processes will get 47% CPU time each while boinc processes will get (at least) 6%.
Since system/user processes usually don't need all CPU cycles, unused cycles can be given to other cgroups, e.g. boinc.

This ensures a system is not sluggish even under heavy load since device drivers (display) and interactive user programs always get enough cpu cycles.


Now set cpu.cfs_period_us for the boinc cgroup to 1000000 (default: 100000) which sets the CPU timeslice to 1 s instead of 100 ms.
This ensures that all background BOINC processes are kept in the CPU cache up to 10 times longer than it would be with default settings.


So, do you think it would be worth effort?

No, as long as you think about CPU affinity.
Instead you may focus on cpu.cfs_quota_us and cpu.cfs_period_us.
ID: 40793 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,822,750
RAC: 126,564
Message 40796 - Posted: 5 Dec 2019, 10:54:46 UTC

Useful is also a BIOS-Update of the Motherboard - if avalaible and have a good stability.
The Power-Management is today on a very good level.
AMD with the ZEN-Architecture (Ryzen or Threadripper) have a lot of feature working very good with the sensors on board.
The power of the CPU's are growing up and the energy in use is slowing down.
ID: 40796 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,231,282
RAC: 5,477
Message 40846 - Posted: 8 Dec 2019, 15:24:11 UTC - in response to Message 40788.  

You mention upgrading your processors. How old were your motherboards and what speed RAM is supported. You may see a slight performance upgrade depending on your memory speeds. Have you taken a look at that?
ID: 40846 · Report as offensive     Reply Quote
lazlo_vii
Avatar

Send message
Joined: 20 Nov 19
Posts: 21
Credit: 1,074,330
RAC: 0
Message 40896 - Posted: 11 Dec 2019, 7:14:10 UTC - in response to Message 40846.  

You mention upgrading your processors. How old were your motherboards and what speed RAM is supported. You may see a slight performance upgrade depending on your memory speeds. Have you taken a look at that?


I didn't run LHC@Home on my old hardware.
ID: 40896 · Report as offensive     Reply Quote

Message boards : Number crunching : My idle thoughts on my idle cores...is this correct?


©2024 CERN