Message boards :
Number crunching :
Hyperthreading vice no hyperthreading for VM tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,517,274 RAC: 118,874 |
Once in a while, in various threads here in the Forum the topic hyper-threading vice no hyperthreading is mentioned. From what I gather, the statements are that hyper-threading does not bring any advantages over not hyper-threading. One of the statements was roughly "1 horse power is 1 horse power - so if you make 2 cores out of 1, you don't get 2 horse powers". Which would seem logical. So far, I have used hyperthreading with my i7-4930k (@ 3.9 GHz), making 12 cores out of 6. On two cores, I was running 1 GPUGRID task each; on 8 cores I was running various LHC VM tasks (ATLAS, LHCb, CMS). By this, the Windows task manager showed a total CPU usage of around 86%. Yesterday, just to be curious, I switched hyperthreading off in the BIOS, and now run 2 GPUGRID tasks, plus 3 LHC VM tasks. Again, the CPU usage is shown to be around 86%. However, from what I could see so far, there is no improvement at all. How do I measure: from the results page, where all the finished tasks are listed, I divide the runtime (in seconds) of a given task by the credit points. So, when I compare the most recent LHCb tasks which I crunched within the past days until yesterday, with the ones that I crunched from yesterday on, the quotient is about the same (roughly 140). Which seems to show that it does not make any difference whether the CPU setting is "hyperthreading" or not. Except that now, with hyperthreading switched off, I am crunching only 3 LHC VM tasks concurrently, whereas with hyperthreading switched on, it's 8 tasks. So, on the bottom line, the statement that one does not get 2 horse powers out of 1 by switching on hyperthreading, seems to be wrong - at least when crunching LHC VM tasks. I would be pleased to receive any comments - thanks in advance. |
Send message Joined: 2 May 07 Posts: 2096 Credit: 159,557,533 RAC: 140,476 |
One Horse have only ONE HorsePower ;-)) Hyperthreading is a marketing argument to see more CPU's than be avalaible. Yes, you can use HT and trimming the work as much as possible. (RAM, HDD or SSD, Networking). But...... Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.php |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,517,274 RAC: 118,874 |
Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.phpinteresting information - in all 35 cases of the CPU Intel i7-4930k, obviously hyperthreading is switched on, using all 12 cores (6+6HT). |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Hyperthreading allows the pipeline to be more fully utilized, since instructions from two different streams may be used, depending on processor availability. In most cases, it will result in about a 25% (up to around 40%) improvement in throughput. It might work a bit differently with VBox, but I would not disable it. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
By the way, there is one hidden cost of hyperthreading, which is that since twice as many work units can run, you will need twice as much memory. So the 25% gain in throughput may not be worth it if you are short of memory. Normally though, I just buy enough memory. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,517,274 RAC: 118,874 |
So the 25% gain in throughput may not be worth it if you are short of memoryMemory is 32GB :-) |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Memory is 32GB :-) That is quite enough for an 8-core machine. I have found that on my Ryzen 1700 (16 virtual cores), I need 64 GB though. That is no great surprise, I knew that when I built it with 32 GB, but I had forgotten until I got bitten recently by some tasks that were suspended for not enough memory. So I bought another 32 GB and now have more than enough. PS - I originally built the Ryzen for WCG, where 32 GB is more than enough. But I tried it here, and was surprised at how well it worked with VBox projects, so switched it over. |
Send message Joined: 24 Oct 04 Posts: 1127 Credit: 49,745,199 RAC: 10,798 |
I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem. And easily run 8 Theory tasks with 16GB ram (I have run hundreds maybe thousands of 8-core multi-core Atlas tasks with 16 and 24 GB ram) Volunteer Mad Scientist For Life |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem. Yes, my numbers are high since I devote 12 GB for a write cache on an 8-core (32 GB) machine, and 24 GB on a 16-core (64 GB) machine. It saves the SSD. |
Send message Joined: 22 Mar 17 Posts: 55 Credit: 10,223,976 RAC: 189 |
Most apps do gain with HT on and can do more work in the same amount of time. If there is a branch hit or info is requested from memory on one thread, the other can still utilize some the CPU cycles that would have been otherwise wasted. There was one BOINC app where I saw that turning HT off on my 2670v1 actually did produce more work but that is by far in the minority of apps. WUProp data does include if CPUs have HT On or Off and might indicate if an app is better with it on or off. |
Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0 |
Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots? |
Send message Joined: 15 Jun 08 Posts: 2410 Credit: 226,058,552 RAC: 126,630 |
Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots? This page may give a high level overview regarding disk caching on a linux system: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ Changing the default parameters will not always result in a better performance. On a system with lots of RAM that runs reliable 24/7 you may alternatively consider to mount the "/slots/" folder as tmpfs. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I use these parameters for a 12 GB cache (32 GB memory), with a four-hour write delay: sudo sysctl vm.dirty_background_bytes=12000000000 sudo sysctl vm.dirty_bytes=13000000000 sudo sysctl vm.dirty_writeback_centisecs=500 (checks the cache every 5 seconds) sudo sysctl vm.dirty_expire_centisecs=1440000 (flush pages older than 4 hours) For a 24 GB cache I change the first two values, but keep the others the same: sudo sysctl vm.dirty_background_bytes=24000000000 sudo sysctl vm.dirty_bytes=25000000000 There is no difference in performance; it is about protecting the SSD from excessive writes. Originally, it was for the CEP2 project on WCG, where the writes were quite high; well over 1 TB/day for 8 cores. It may not really be a problem here, but for CPDN the writes can still get a little high. |
Send message Joined: 15 Jun 08 Posts: 2410 Credit: 226,058,552 RAC: 126,630 |
@ Jim1348 Have you ever monitored the behaviour of your write cache? There may be syncs initiated by other processes within the configured 4h period. An easy monitor could be: watch -n1 "egrep -i dirty /proc/meminfo" Started after "sync" the output should grow over the configured period in "vm.dirty_expire_centisecs". Once the numbers drop to 0 (or close above) the cache has been written to disk. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Good. I need a way to monitor the writes in Linux. I usually do it on my Windows machine, where the monitoring tools are readily available and easy to use. SsdReady will monitor the writes to any selected drive, or PrimoCache will tell you the writes that the OS does, and also those which actually are written to the drive, in total at any given time, though it does not give the rate. At the moment, I have set up a ramdisk on my Windows machine (for CPDN), and by using SsdReady, can monitor the writes just to that. The system writes are usually pretty negligible in any case (though "negligible" in Windows is a relative term - it can still be 20 GB/day just for logging, etc.). Thanks. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,517,274 RAC: 118,874 |
Since switching off hyperthreading did not yield at all what I was expecting, I switched it on again. So, now I am back to using 7-8 CPU cores for crunching LHC tasks (besides using 2 cores for 2 GPUGRID tasks, in combination with my 2 GTX980ti's). |
©2024 CERN