Message boards : Number crunching : Hyperthreading vice no hyperthreading for VM tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,517,274
RAC: 118,874
Message 34630 - Posted: 14 Mar 2018, 10:26:49 UTC

Once in a while, in various threads here in the Forum the topic hyper-threading vice no hyperthreading is mentioned.
From what I gather, the statements are that hyper-threading does not bring any advantages over not hyper-threading. One of the statements was roughly "1 horse power is 1 horse power - so if you make 2 cores out of 1, you don't get 2 horse powers". Which would seem logical.

So far, I have used hyperthreading with my i7-4930k (@ 3.9 GHz), making 12 cores out of 6.
On two cores, I was running 1 GPUGRID task each; on 8 cores I was running various LHC VM tasks (ATLAS, LHCb, CMS). By this, the Windows task manager showed a total CPU usage of around 86%.

Yesterday, just to be curious, I switched hyperthreading off in the BIOS, and now run 2 GPUGRID tasks, plus 3 LHC VM tasks. Again, the CPU usage is shown to be around 86%.

However, from what I could see so far, there is no improvement at all. How do I measure: from the results page, where all the finished tasks are listed, I divide the runtime (in seconds) of a given task by the credit points. So, when I compare the most recent LHCb tasks which I crunched within the past days until yesterday, with the ones that I crunched from yesterday on, the quotient is about the same (roughly 140).
Which seems to show that it does not make any difference whether the CPU setting is "hyperthreading" or not.
Except that now, with hyperthreading switched off, I am crunching only 3 LHC VM tasks concurrently, whereas with hyperthreading switched on, it's 8 tasks.

So, on the bottom line, the statement that one does not get 2 horse powers out of 1 by switching on hyperthreading, seems to be wrong - at least when crunching LHC VM tasks.

I would be pleased to receive any comments - thanks in advance.
ID: 34630 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,557,533
RAC: 140,476
Message 34631 - Posted: 14 Mar 2018, 10:59:30 UTC
Last modified: 14 Mar 2018, 11:13:52 UTC

One Horse have only ONE HorsePower ;-))

Hyperthreading is a marketing argument to see more CPU's than be avalaible.
Yes, you can use HT and trimming the work as much as possible. (RAM, HDD or SSD, Networking).
But......

Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.php
ID: 34631 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,517,274
RAC: 118,874
Message 34633 - Posted: 14 Mar 2018, 13:14:27 UTC - in response to Message 34631.  

Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.php
interesting information - in all 35 cases of the CPU Intel i7-4930k, obviously hyperthreading is switched on, using all 12 cores (6+6HT).
ID: 34633 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34635 - Posted: 14 Mar 2018, 15:05:29 UTC
Last modified: 14 Mar 2018, 15:06:36 UTC

Hyperthreading allows the pipeline to be more fully utilized, since instructions from two different streams may be used, depending on processor availability. In most cases, it will result in about a 25% (up to around 40%) improvement in throughput. It might work a bit differently with VBox, but I would not disable it.
ID: 34635 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34636 - Posted: 14 Mar 2018, 16:30:12 UTC - in response to Message 34635.  

By the way, there is one hidden cost of hyperthreading, which is that since twice as many work units can run, you will need twice as much memory. So the 25% gain in throughput may not be worth it if you are short of memory. Normally though, I just buy enough memory.
ID: 34636 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,517,274
RAC: 118,874
Message 34637 - Posted: 14 Mar 2018, 16:33:38 UTC - in response to Message 34636.  

So the 25% gain in throughput may not be worth it if you are short of memory
Memory is 32GB :-)
ID: 34637 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34638 - Posted: 14 Mar 2018, 17:33:00 UTC - in response to Message 34637.  
Last modified: 14 Mar 2018, 18:03:16 UTC

Memory is 32GB :-)

That is quite enough for an 8-core machine. I have found that on my Ryzen 1700 (16 virtual cores), I need 64 GB though. That is no great surprise, I knew that when I built it with 32 GB, but I had forgotten until I got bitten recently by some tasks that were suspended for not enough memory. So I bought another 32 GB and now have more than enough.

PS - I originally built the Ryzen for WCG, where 32 GB is more than enough. But I tried it here, and was surprised at how well it worked with VBox projects, so switched it over.
ID: 34638 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1127
Credit: 49,745,199
RAC: 10,798
Message 34645 - Posted: 14 Mar 2018, 18:58:41 UTC

I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem.

And easily run 8 Theory tasks with 16GB ram

(I have run hundreds maybe thousands of 8-core multi-core Atlas tasks with 16 and 24 GB ram)
Volunteer Mad Scientist For Life
ID: 34645 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34649 - Posted: 14 Mar 2018, 20:11:14 UTC - in response to Message 34645.  

I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem.

And easily run 8 Theory tasks with 16GB ram

(I have run hundreds maybe thousands of 8-core multi-core Atlas tasks with 16 and 24 GB ram)

Yes, my numbers are high since I devote 12 GB for a write cache on an 8-core (32 GB) machine, and 24 GB on a 16-core (64 GB) machine. It saves the SSD.
ID: 34649 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 22 Mar 17
Posts: 55
Credit: 10,223,976
RAC: 189
Message 34652 - Posted: 14 Mar 2018, 22:14:15 UTC

Most apps do gain with HT on and can do more work in the same amount of time. If there is a branch hit or info is requested from memory on one thread, the other can still utilize some the CPU cycles that would have been otherwise wasted. There was one BOINC app where I saw that turning HT off on my 2670v1 actually did produce more work but that is by far in the minority of apps.

WUProp data does include if CPUs have HT On or Off and might indicate if an app is better with it on or off.
ID: 34652 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 34666 - Posted: 15 Mar 2018, 10:24:28 UTC - in response to Message 34649.  

Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots?
ID: 34666 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2410
Credit: 226,058,552
RAC: 126,630
Message 34667 - Posted: 15 Mar 2018, 11:19:04 UTC - in response to Message 34666.  

Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots?

This page may give a high level overview regarding disk caching on a linux system:
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
Changing the default parameters will not always result in a better performance.

On a system with lots of RAM that runs reliable 24/7 you may alternatively consider to mount the "/slots/" folder as tmpfs.
ID: 34667 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34670 - Posted: 15 Mar 2018, 12:26:06 UTC - in response to Message 34667.  

I use these parameters for a 12 GB cache (32 GB memory), with a four-hour write delay:
sudo sysctl vm.dirty_background_bytes=12000000000
sudo sysctl vm.dirty_bytes=13000000000
sudo sysctl vm.dirty_writeback_centisecs=500  (checks the cache every 5 seconds)
sudo sysctl vm.dirty_expire_centisecs=1440000  (flush pages older than 4 hours)

For a 24 GB cache I change the first two values, but keep the others the same:
sudo sysctl vm.dirty_background_bytes=24000000000
sudo sysctl vm.dirty_bytes=25000000000


There is no difference in performance; it is about protecting the SSD from excessive writes. Originally, it was for the CEP2 project on WCG, where the writes were quite high; well over 1 TB/day for 8 cores. It may not really be a problem here, but for CPDN the writes can still get a little high.
ID: 34670 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2410
Credit: 226,058,552
RAC: 126,630
Message 34673 - Posted: 15 Mar 2018, 13:52:58 UTC - in response to Message 34670.  

@ Jim1348

Have you ever monitored the behaviour of your write cache?
There may be syncs initiated by other processes within the configured 4h period.

An easy monitor could be:
watch -n1 "egrep -i dirty /proc/meminfo"
Started after "sync" the output should grow over the configured period in "vm.dirty_expire_centisecs".
Once the numbers drop to 0 (or close above) the cache has been written to disk.
ID: 34673 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 34675 - Posted: 15 Mar 2018, 16:37:45 UTC - in response to Message 34673.  
Last modified: 15 Mar 2018, 16:38:21 UTC

Good. I need a way to monitor the writes in Linux. I usually do it on my Windows machine, where the monitoring tools are readily available and easy to use. SsdReady will monitor the writes to any selected drive, or PrimoCache will tell you the writes that the OS does, and also those which actually are written to the drive, in total at any given time, though it does not give the rate.

At the moment, I have set up a ramdisk on my Windows machine (for CPDN), and by using SsdReady, can monitor the writes just to that. The system writes are usually pretty negligible in any case (though "negligible" in Windows is a relative term - it can still be 20 GB/day just for logging, etc.).

Thanks.
ID: 34675 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,517,274
RAC: 118,874
Message 34681 - Posted: 16 Mar 2018, 7:45:22 UTC

Since switching off hyperthreading did not yield at all what I was expecting, I switched it on again. So, now I am back to using 7-8 CPU cores for crunching LHC tasks (besides using 2 cores for 2 GPUGRID tasks, in combination with my 2 GTX980ti's).
ID: 34681 · Report as offensive     Reply Quote

Message boards : Number crunching : Hyperthreading vice no hyperthreading for VM tasks


©2024 CERN