Message boards : News : No RESULTS accepted from Linux Kernel 4.8.*
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Uwe Plonus (neu)

Send message
Joined: 1 Jul 17
Posts: 5
Credit: 158,699
RAC: 0
Message 31449 - Posted: 17 Jul 2017, 7:53:08 UTC

I'm one of the users with a banned host. Is it possible to help you to solve the problem?

I have one of the Intel processors and Linux kernel 4.8. Perhaps any other component is the problem (some lib).

How can we help solving the problem?
ID: 31449 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31451 - Posted: 17 Jul 2017, 10:24:10 UTC - in response to Message 31449.  

Dear Uwe, first thank for your support and offer of help.
The Host ID 10487841 is the 4th worst host on my list! Still you have provided 2450
Valid results BUT 18948 Invalid. This is NOT your fault. We have reason
to believe there is a hyper threading problem with Linux kernel 4.8.0
and higher. You have an 8 processor Intel Family 6 and are running
Linux 4.8.0-58-generic.

What would be great is that you apply the fix. Then I unblock you and we
shall see. I am not a great Linux guru but I shall try and find some hopefully
clear and simple instructions for doing so. I am also waiting for feedback
form user "Englab" who is trying this as well. I'll send an e-mail to you
soonest.

Thanks again. Eric.


I'm one of the users with a banned host. Is it possible to help you to solve the problem?

I have one of the Intel processors and Linux kernel 4.8. Perhaps any other component is the problem (some lib).

How can we help solving the problem?

ID: 31451 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31452 - Posted: 17 Jul 2017, 10:40:45 UTC - in response to Message 31449.  

I am afraid my e-mail to jens@jup-coburg.de
bounced. Could you send me your e-mail address
to eric.mcintosh@cern.ch. Thanks.
I am assuming Hyperthreading is active on your machine (the default
I think) If not I have to think again..
In the meantimethe problem is described at
http://www.guru3d.com/news-story/debian-project-warns-turn-off-hyperthreading-with-skylake-and-kaby-lake.html

If you could run the shell command as described there:
grep -E 'model|stepping' /proc/cpuinfo | sort -u
that might be useful.......

I also found:

Hyper-threading
Hyper-threading (HT) refers to the hardware-backed capacity of Intel CPUs to efficiently switch between two execution threads. This is faster than a single thread at a time when these two threads are using different CPU ressources at the same moment. For the system (and users), a single CPU core appears as two "virtual" CPUs. But for a parallel computational code limited by the CPU throughput, HT can slow down your programs: there is only one floating-point unit per CPU core (FPU), trying to use it in two threads will thrash the CPU cache (and flush the pipelines?).

Disabling HT is easy. Reboot your computer, enter the BIOS menu and disable "Intel Hyper-Threading Technology". Restart the system, and now you only see half of the previous CPUs, but this time these are real, hardware CPUs.

When the machine is far away and you can't access the BIOS (some rental services for example), you can achieve something similar from the command-line: just instruct the system to ignore half of the CPUs. For example, to stop using (virtual) CPU N (as root, replace N with the CPU number):

echo 0 > /sys/devices/system/cpu/cpuN/online
You can do it automagically on system start with these entries in /etc/crontab with the special @reboot field:

# disable hyperthreading, cores 4-7
@reboot root echo 0 > /sys/devices/system/cpu/cpu4/online
@reboot root echo 0 > /sys/devices/system/cpu/cpu5/online
@reboot root echo 0 > /sys/devices/system/cpu/cpu6/online
@reboot root echo 0 > /sys/devices/system/cpu/cpu7/online
Re-enable the CPU with echo 1. To decide which CPU has to be disabled, get the list of "virtual" CPUs backed by the same CPU core from /sys/devices/system/cpu/cpuN/topology/thread_siblings_list.

Hopefully some experts will provide some simpler and better instructions. Eric.
ID: 31452 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31453 - Posted: 17 Jul 2017, 11:00:49 UTC - in response to Message 31392.  

Do you have hyperthreading enabled?
Have you applied any patches or BIOS updates?
Thanks. Eric.

I'm not sure how you arrived at that conclusion? I represent a sample size of 1, but my SixTrack tasks seem to be running OK and are being (slowly) validated.

I run Xubuntu Linux 64-bit with kernel 4.8.0-58.

P.S. Maybe my sample size is 2. I have 2 machines running SixTrack without problems under Xubuntu Linux with the 64-bit kernel 4.8. You can check the details of my CPUs.

ID: 31453 · Report as offensive     Reply Quote
Uwe Plonus (neu)

Send message
Joined: 1 Jul 17
Posts: 5
Credit: 158,699
RAC: 0
Message 31454 - Posted: 17 Jul 2017, 11:07:18 UTC - in response to Message 31451.  

This is NOT your fault. We have reason
to believe there is a hyper threading problem with Linux kernel 4.8.0
and higher. You have an 8 processor Intel Family 6 and are running
Linux 4.8.0-58-generic.

What would be great is that you apply the fix. Then I unblock you and we
shall see. I am not a great Linux guru but I shall try and find some hopefully
clear and simple instructions for doing so. I am also waiting for feedback
form user "Englab" who is trying this as well. I'll send an e-mail to you
soonest.


I applied the fix (2017-05-11) already but found out today that Intel has released a new microcode patch at 2017-07-07. I'll apply this fix latest tomorrow and inform you in this forum.
ID: 31454 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 31455 - Posted: 17 Jul 2017, 11:07:28 UTC

I am not on the banned list, but I am getting the message "Not accepting requests from this host" when I try an update to get new work. I don't see any obvious reason for it. Could you please look into it?
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10477864
ID: 31455 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31456 - Posted: 17 Jul 2017, 11:30:03 UTC - in response to Message 31455.  

HI, I see 24 Tasks in progress for this host . Tpo many! I am afraid I
might have "banned" it accidentally but indeed it is OK now.
Thanks for all your support. Eric.



I am not on the banned list, but I am getting the message "Not accepting requests from this host" when I try an update to get new work. I don't see any obvious reason for it. Could you please look into it?
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10477864

ID: 31456 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31457 - Posted: 17 Jul 2017, 11:31:24 UTC

I am afraid I did not REALLY unban 770 hosts until 13:15 today.
Apologies, MEA CULPA and all should be OK now. Eric.
ID: 31457 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 31458 - Posted: 17 Jul 2017, 12:21:28 UTC - in response to Message 31457.  

Thanks, it is working now.
ID: 31458 · Report as offensive     Reply Quote
Uwe Plonus (neu)

Send message
Joined: 1 Jul 17
Posts: 5
Credit: 158,699
RAC: 0
Message 31465 - Posted: 17 Jul 2017, 15:51:42 UTC - in response to Message 31454.  

This is NOT your fault. We have reason
to believe there is a hyper threading problem with Linux kernel 4.8.0
and higher. You have an 8 processor Intel Family 6 and are running
Linux 4.8.0-58-generic.

What would be great is that you apply the fix. Then I unblock you and we
shall see. I am not a great Linux guru but I shall try and find some hopefully
clear and simple instructions for doing so. I am also waiting for feedback
form user "Englab" who is trying this as well. I'll send an e-mail to you
soonest.


I applied the fix (2017-05-11) already but found out today that Intel has released a new microcode patch at 2017-07-07. I'll apply this fix latest tomorrow and inform you in this forum.


I applied the fix just now but it has no new microcode for my microprocessor. If you wish we can try if the problem is fixed.
ID: 31465 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31469 - Posted: 17 Jul 2017, 18:04:55 UTC - in response to Message 31465.  

Great, many thanks. Which host or hosts and I'll retry. Eric.

This is NOT your fault. We have reason
to believe there is a hyper threading problem with Linux kernel 4.8.0
and higher. You have an 8 processor Intel Family 6 and are running
Linux 4.8.0-58-generic.

What would be great is that you apply the fix. Then I unblock you and we
shall see. I am not a great Linux guru but I shall try and find some hopefully
clear and simple instructions for doing so. I am also waiting for feedback
form user "Englab" who is trying this as well. I'll send an e-mail to you
soonest.


I applied the fix (2017-05-11) already but found out today that Intel has released a new microcode patch at 2017-07-07. I'll apply this fix latest tomorrow and inform you in this forum.


I applied the fix just now but it has no new microcode for my microprocessor. If you wish we can try if the problem is fixed.

ID: 31469 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31470 - Posted: 17 Jul 2017, 18:15:58 UTC - in response to Message 31469.  

OK I found it I think. Host ID 10487841. Eric.
ID: 31470 · Report as offensive     Reply Quote
Uwe Plonus (neu)

Send message
Joined: 1 Jul 17
Posts: 5
Credit: 158,699
RAC: 0
Message 31473 - Posted: 17 Jul 2017, 20:13:45 UTC - in response to Message 31470.  

OK I found it I think. Host ID 10487841. Eric.


That's correct. Let's see what happens.
ID: 31473 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,109,667
RAC: 104,249
Message 31474 - Posted: 18 Jul 2017, 5:55:15 UTC

more than 1K Errors in Sixtrack:

GenuineIntel
Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz [Family 6 Model 79 Stepping 1]

Linux 4.1.12-94.3.8.el7uek.x86_64

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10455567
ID: 31474 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31476 - Posted: 18 Jul 2017, 7:18:13 UTC - in response to Message 31474.  

Another vital clue; not kernel 4.8.*, but over 1000 "compute" errors.
Looks like failure to download the input file! But why.
I can't really see this as the fault of the host BUT we shall find out.
stderr appended. Eric.

Stderr output
7.6.22

22:18:35 (111405): Can't open init data file - running in standalone mode
unzip: cannot find Sixin.zip, Sixin.zip.zip or Sixin.zip.ZIP.
22:18:35 (111405): called boinc_finish



upload failure:
LHC_2015_LHC_2015_260_HO_BOINC__19__s__62.31_60.32__5_6__5__66_1_sixvf_boinc8717_0_0
-161 (not found)



]]>




more than 1K Errors in Sixtrack:

GenuineIntel
Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz [Family 6 Model 79 Stepping 1]

Linux 4.1.12-94.3.8.el7uek.x86_64

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10455567

ID: 31476 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31477 - Posted: 18 Jul 2017, 7:20:37 UTC - in response to Message 31473.  

Well at least one has failed but with a result difference.
Have re-banned 10487841. Need to get in touch soonest.
Eric.



OK I found it I think. Host ID 10487841. Eric.


That's correct. Let's see what happens.

ID: 31477 · Report as offensive     Reply Quote
Uwe Plonus (neu)

Send message
Joined: 1 Jul 17
Posts: 5
Credit: 158,699
RAC: 0
Message 31480 - Posted: 18 Jul 2017, 10:30:19 UTC - in response to Message 31477.  

Well at least one has failed but with a result difference.
Have re-banned 10487841. Need to get in touch soonest.
Eric.


Send you a private message.
ID: 31480 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31505 - Posted: 19 Jul 2017, 19:05:42 UTC - in response to Message 31480.  

Uwe has updated his BIOS but SixTrack is still failing.
He will try again this weekend and turn off Hyperthreading as well.

Well at least one has failed but with a result difference.
Have re-banned 10487841. Need to get in touch soonest.
Eric.


Send you a private message.

ID: 31505 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 31545 - Posted: 21 Jul 2017, 23:11:27 UTC - in response to Message 31453.  
Last modified: 21 Jul 2017, 23:14:29 UTC

My apologies for late reply. I only now saw your message. I do have hyper-threading enabled. However, my CPUs are older Intel models that are not affected by the hyper-threading bug that is affecting Intel's Skylake and Kaby Lake CPUs.

P.S. I would expect that Intel bug in the microcode to be affecting other Linux kernels and other operating systems too for the relevant CPUs.
ID: 31545 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 31561 - Posted: 23 Jul 2017, 8:05:01 UTC - in response to Message 31448.  

OK, maybe my list is incomplete. This host is already banned though. Eric.

This one is not (yet) banned, but seems not trustful: Host 9841071

The mentioned host is still getting tasks or getting tasks again and only producing inconclusive's, invalids and errors.

State: All (2145) · In progress (0) · Validation pending (518) · Validation inconclusive (1039) · Valid (1) · Invalid (285) · Error (302)
ID: 31561 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : News : No RESULTS accepted from Linux Kernel 4.8.*


©2024 CERN