Hyperthreading vice no hyperthreading for VM tasks

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1904 Credit: 144,030,257 RAC: 82,502	Message 34630 - Posted: 14 Mar 2018, 10:26:49 UTC Once in a while, in various threads here in the Forum the topic hyper-threading vice no hyperthreading is mentioned. From what I gather, the statements are that hyper-threading does not bring any advantages over not hyper-threading. One of the statements was roughly "1 horse power is 1 horse power - so if you make 2 cores out of 1, you don't get 2 horse powers". Which would seem logical. So far, I have used hyperthreading with my i7-4930k (@ 3.9 GHz), making 12 cores out of 6. On two cores, I was running 1 GPUGRID task each; on 8 cores I was running various LHC VM tasks (ATLAS, LHCb, CMS). By this, the Windows task manager showed a total CPU usage of around 86%. Yesterday, just to be curious, I switched hyperthreading off in the BIOS, and now run 2 GPUGRID tasks, plus 3 LHC VM tasks. Again, the CPU usage is shown to be around 86%. However, from what I could see so far, there is no improvement at all. How do I measure: from the results page, where all the finished tasks are listed, I divide the runtime (in seconds) of a given task by the credit points. So, when I compare the most recent LHCb tasks which I crunched within the past days until yesterday, with the ones that I crunched from yesterday on, the quotient is about the same (roughly 140). Which seems to show that it does not make any difference whether the CPU setting is "hyperthreading" or not. Except that now, with hyperthreading switched off, I am crunching only 3 LHC VM tasks concurrently, whereas with hyperthreading switched on, it's 8 tasks. So, on the bottom line, the statement that one does not get 2 horse powers out of 1 by switching on hyperthreading, seems to be wrong - at least when crunching LHC VM tasks. I would be pleased to receive any comments - thanks in advance. ID: 34630 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2276 Credit: 177,235,692 RAC: 71,961	Message 34631 - Posted: 14 Mar 2018, 10:59:30 UTC Last modified: 14 Mar 2018, 11:13:52 UTC One Horse have only ONE HorsePower ;-)) Hyperthreading is a marketing argument to see more CPU's than be avalaible. Yes, you can use HT and trimming the work as much as possible. (RAM, HDD or SSD, Networking). But...... Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.php ID: 34631 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1904 Credit: 144,030,257 RAC: 82,502	Message 34633 - Posted: 14 Mar 2018, 13:14:27 UTC - in response to Message 34631. Edit: https://lhcathome.cern.ch/lhcathome/cpu_list.php interesting information - in all 35 cases of the CPU Intel i7-4930k, obviously hyperthreading is switched on, using all 12 cores (6+6HT). ID: 34633 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34635 - Posted: 14 Mar 2018, 15:05:29 UTC Last modified: 14 Mar 2018, 15:06:36 UTC Hyperthreading allows the pipeline to be more fully utilized, since instructions from two different streams may be used, depending on processor availability. In most cases, it will result in about a 25% (up to around 40%) improvement in throughput. It might work a bit differently with VBox, but I would not disable it. ID: 34635 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34636 - Posted: 14 Mar 2018, 16:30:12 UTC - in response to Message 34635. By the way, there is one hidden cost of hyperthreading, which is that since twice as many work units can run, you will need twice as much memory. So the 25% gain in throughput may not be worth it if you are short of memory. Normally though, I just buy enough memory. ID: 34636 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1904 Credit: 144,030,257 RAC: 82,502	Message 34637 - Posted: 14 Mar 2018, 16:33:38 UTC - in response to Message 34636. So the 25% gain in throughput may not be worth it if you are short of memory Memory is 32GB :-) ID: 34637 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34638 - Posted: 14 Mar 2018, 17:33:00 UTC - in response to Message 34637. Last modified: 14 Mar 2018, 18:03:16 UTC Memory is 32GB :-) That is quite enough for an 8-core machine. I have found that on my Ryzen 1700 (16 virtual cores), I need 64 GB though. That is no great surprise, I knew that when I built it with 32 GB, but I had forgotten until I got bitten recently by some tasks that were suspended for not enough memory. So I bought another 32 GB and now have more than enough. PS - I originally built the Ryzen for WCG, where 32 GB is more than enough. But I tried it here, and was surprised at how well it worked with VBox projects, so switched it over. ID: 34638 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,171,872 RAC: 153,057	Message 34645 - Posted: 14 Mar 2018, 18:58:41 UTC I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem. And easily run 8 Theory tasks with 16GB ram (I have run hundreds maybe thousands of 8-core multi-core Atlas tasks with 16 and 24 GB ram) Volunteer Mad Scientist For Life ID: 34645 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34649 - Posted: 14 Mar 2018, 20:11:14 UTC - in response to Message 34645. I haven't tried running all 8 cores with Atlas for a while but I can run 8 of the CMS tasks Valid with 24GB ram with no problem. And easily run 8 Theory tasks with 16GB ram (I have run hundreds maybe thousands of 8-core multi-core Atlas tasks with 16 and 24 GB ram) Yes, my numbers are high since I devote 12 GB for a write cache on an 8-core (32 GB) machine, and 24 GB on a 16-core (64 GB) machine. It saves the SSD. ID: 34649 · Reply Quote

mmonnin Send message Joined: 22 Mar 17 Posts: 75 Credit: 27,040,264 RAC: 104,418	Message 34652 - Posted: 14 Mar 2018, 22:14:15 UTC Most apps do gain with HT on and can do more work in the same amount of time. If there is a branch hit or info is requested from memory on one thread, the other can still utilize some the CPU cycles that would have been otherwise wasted. There was one BOINC app where I saw that turning HT off on my 2670v1 actually did produce more work but that is by far in the minority of apps. WUProp data does include if CPUs have HT On or Off and might indicate if an app is better with it on or off. ID: 34652 · Reply Quote

AuxRx Send message Joined: 16 Sep 17 Posts: 100 Credit: 1,618,469 RAC: 0	Message 34666 - Posted: 15 Mar 2018, 10:24:28 UTC - in response to Message 34649. Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots? ID: 34666 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2678 Credit: 286,512,247 RAC: 113,282	Message 34667 - Posted: 15 Mar 2018, 11:19:04 UTC - in response to Message 34666. Any advice on setting up a cache? Did you use proprietary software or which solution did you go with? How do you deal with reboots? This page may give a high level overview regarding disk caching on a linux system: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ Changing the default parameters will not always result in a better performance. On a system with lots of RAM that runs reliable 24/7 you may alternatively consider to mount the "/slots/" folder as tmpfs. ID: 34667 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34670 - Posted: 15 Mar 2018, 12:26:06 UTC - in response to Message 34667. I use these parameters for a 12 GB cache (32 GB memory), with a four-hour write delay: sudo sysctl vm.dirty_background_bytes=12000000000 sudo sysctl vm.dirty_bytes=13000000000 sudo sysctl vm.dirty_writeback_centisecs=500 (checks the cache every 5 seconds) sudo sysctl vm.dirty_expire_centisecs=1440000 (flush pages older than 4 hours) For a 24 GB cache I change the first two values, but keep the others the same: sudo sysctl vm.dirty_background_bytes=24000000000 sudo sysctl vm.dirty_bytes=25000000000 There is no difference in performance; it is about protecting the SSD from excessive writes. Originally, it was for the CEP2 project on WCG, where the writes were quite high; well over 1 TB/day for 8 cores. It may not really be a problem here, but for CPDN the writes can still get a little high. ID: 34670 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2678 Credit: 286,512,247 RAC: 113,282	Message 34673 - Posted: 15 Mar 2018, 13:52:58 UTC - in response to Message 34670. @ Jim1348 Have you ever monitored the behaviour of your write cache? There may be syncs initiated by other processes within the configured 4h period. An easy monitor could be: watch -n1 "egrep -i dirty /proc/meminfo" Started after "sync" the output should grow over the configured period in "vm.dirty_expire_centisecs". Once the numbers drop to 0 (or close above) the cache has been written to disk. ID: 34673 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 34675 - Posted: 15 Mar 2018, 16:37:45 UTC - in response to Message 34673. Last modified: 15 Mar 2018, 16:38:21 UTC Good. I need a way to monitor the writes in Linux. I usually do it on my Windows machine, where the monitoring tools are readily available and easy to use. SsdReady will monitor the writes to any selected drive, or PrimoCache will tell you the writes that the OS does, and also those which actually are written to the drive, in total at any given time, though it does not give the rate. At the moment, I have set up a ramdisk on my Windows machine (for CPDN), and by using SsdReady, can monitor the writes just to that. The system writes are usually pretty negligible in any case (though "negligible" in Windows is a relative term - it can still be 20 GB/day just for logging, etc.). Thanks. ID: 34675 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1904 Credit: 144,030,257 RAC: 82,502	Message 34681 - Posted: 16 Mar 2018, 7:45:22 UTC Since switching off hyperthreading did not yield at all what I was expecting, I switched it on again. So, now I am back to using 7-8 CPU cores for crunching LHC tasks (besides using 2 cores for 2 GPUGRID tasks, in combination with my 2 GTX980ti's). ID: 34681 · Reply Quote

LHC@home