Message boards :
News :
No RESULTS accepted from Linux Kernel 4.8.*
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 1 Jul 17 Posts: 5 Credit: 158,699 RAC: 0 |
I'm one of the users with a banned host. Is it possible to help you to solve the problem? I have one of the Intel processors and Linux kernel 4.8. Perhaps any other component is the problem (some lib). How can we help solving the problem? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Dear Uwe, first thank for your support and offer of help. The Host ID 10487841 is the 4th worst host on my list! Still you have provided 2450 Valid results BUT 18948 Invalid. This is NOT your fault. We have reason to believe there is a hyper threading problem with Linux kernel 4.8.0 and higher. You have an 8 processor Intel Family 6 and are running Linux 4.8.0-58-generic. What would be great is that you apply the fix. Then I unblock you and we shall see. I am not a great Linux guru but I shall try and find some hopefully clear and simple instructions for doing so. I am also waiting for feedback form user "Englab" who is trying this as well. I'll send an e-mail to you soonest. Thanks again. Eric. I'm one of the users with a banned host. Is it possible to help you to solve the problem? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I am afraid my e-mail to jens@jup-coburg.de bounced. Could you send me your e-mail address to eric.mcintosh@cern.ch. Thanks. I am assuming Hyperthreading is active on your machine (the default I think) If not I have to think again.. In the meantimethe problem is described at http://www.guru3d.com/news-story/debian-project-warns-turn-off-hyperthreading-with-skylake-and-kaby-lake.html If you could run the shell command as described there: grep -E 'model|stepping' /proc/cpuinfo | sort -u that might be useful....... I also found: Hyper-threading Hyper-threading (HT) refers to the hardware-backed capacity of Intel CPUs to efficiently switch between two execution threads. This is faster than a single thread at a time when these two threads are using different CPU ressources at the same moment. For the system (and users), a single CPU core appears as two "virtual" CPUs. But for a parallel computational code limited by the CPU throughput, HT can slow down your programs: there is only one floating-point unit per CPU core (FPU), trying to use it in two threads will thrash the CPU cache (and flush the pipelines?). Disabling HT is easy. Reboot your computer, enter the BIOS menu and disable "Intel Hyper-Threading Technology". Restart the system, and now you only see half of the previous CPUs, but this time these are real, hardware CPUs. When the machine is far away and you can't access the BIOS (some rental services for example), you can achieve something similar from the command-line: just instruct the system to ignore half of the CPUs. For example, to stop using (virtual) CPU N (as root, replace N with the CPU number): echo 0 > /sys/devices/system/cpu/cpuN/online You can do it automagically on system start with these entries in /etc/crontab with the special @reboot field: # disable hyperthreading, cores 4-7 @reboot root echo 0 > /sys/devices/system/cpu/cpu4/online @reboot root echo 0 > /sys/devices/system/cpu/cpu5/online @reboot root echo 0 > /sys/devices/system/cpu/cpu6/online @reboot root echo 0 > /sys/devices/system/cpu/cpu7/online Re-enable the CPU with echo 1. To decide which CPU has to be disabled, get the list of "virtual" CPUs backed by the same CPU core from /sys/devices/system/cpu/cpuN/topology/thread_siblings_list. Hopefully some experts will provide some simpler and better instructions. Eric. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Do you have hyperthreading enabled? Have you applied any patches or BIOS updates? Thanks. Eric. I'm not sure how you arrived at that conclusion? I represent a sample size of 1, but my SixTrack tasks seem to be running OK and are being (slowly) validated. |
Send message Joined: 1 Jul 17 Posts: 5 Credit: 158,699 RAC: 0 |
This is NOT your fault. We have reason I applied the fix (2017-05-11) already but found out today that Intel has released a new microcode patch at 2017-07-07. I'll apply this fix latest tomorrow and inform you in this forum. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I am not on the banned list, but I am getting the message "Not accepting requests from this host" when I try an update to get new work. I don't see any obvious reason for it. Could you please look into it? https://lhcathome.cern.ch/lhcathome/results.php?hostid=10477864 |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
HI, I see 24 Tasks in progress for this host . Tpo many! I am afraid I might have "banned" it accidentally but indeed it is OK now. Thanks for all your support. Eric. I am not on the banned list, but I am getting the message "Not accepting requests from this host" when I try an update to get new work. I don't see any obvious reason for it. Could you please look into it? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I am afraid I did not REALLY unban 770 hosts until 13:15 today. Apologies, MEA CULPA and all should be OK now. Eric. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Thanks, it is working now. |
Send message Joined: 1 Jul 17 Posts: 5 Credit: 158,699 RAC: 0 |
This is NOT your fault. We have reason I applied the fix just now but it has no new microcode for my microprocessor. If you wish we can try if the problem is fixed. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Great, many thanks. Which host or hosts and I'll retry. Eric. This is NOT your fault. We have reason |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
OK I found it I think. Host ID 10487841. Eric. |
Send message Joined: 1 Jul 17 Posts: 5 Credit: 158,699 RAC: 0 |
OK I found it I think. Host ID 10487841. Eric. That's correct. Let's see what happens. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
more than 1K Errors in Sixtrack: GenuineIntel Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz [Family 6 Model 79 Stepping 1] Linux 4.1.12-94.3.8.el7uek.x86_64 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10455567 |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Another vital clue; not kernel 4.8.*, but over 1000 "compute" errors. Looks like failure to download the input file! But why. I can't really see this as the fault of the host BUT we shall find out. stderr appended. Eric. Stderr output 22:18:35 (111405): Can't open init data file - running in standalone mode unzip: cannot find Sixin.zip, Sixin.zip.zip or Sixin.zip.ZIP. 22:18:35 (111405): called boinc_finish upload failure: ]]> more than 1K Errors in Sixtrack: |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well at least one has failed but with a result difference. Have re-banned 10487841. Need to get in touch soonest. Eric. OK I found it I think. Host ID 10487841. Eric. |
Send message Joined: 1 Jul 17 Posts: 5 Credit: 158,699 RAC: 0 |
Well at least one has failed but with a result difference. Send you a private message. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Uwe has updated his BIOS but SixTrack is still failing. He will try again this weekend and turn off Hyperthreading as well. Well at least one has failed but with a result difference. |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 12 |
My apologies for late reply. I only now saw your message. I do have hyper-threading enabled. However, my CPUs are older Intel models that are not affected by the hyper-threading bug that is affecting Intel's Skylake and Kaby Lake CPUs. P.S. I would expect that Intel bug in the microcode to be affecting other Linux kernels and other operating systems too for the relevant CPUs. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
OK, maybe my list is incomplete. This host is already banned though. Eric. The mentioned host is still getting tasks or getting tasks again and only producing inconclusive's, invalids and errors. State: All (2145) · In progress (0) · Validation pending (518) · Validation inconclusive (1039) · Valid (1) · Invalid (285) · Error (302) |
©2024 CERN