Message boards : News : No RESULTS accepted from Linux Kernel 4.8.*
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31389 - Posted: 14 Jul 2017, 13:50:37 UTC

As an emergency measure and over the weekend, I have set
max_results_day to -1 for all hosts running Linux (Ubuntu?)
Kernel 4.8.*. SixTrack is consistently crashing with an IFORT run
time formatted I/O error. This will avoid wasting your valuable
contributions. Eric.
ID: 31389 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 31392 - Posted: 14 Jul 2017, 21:12:16 UTC - in response to Message 31389.  
Last modified: 14 Jul 2017, 21:29:41 UTC

I'm not sure how you arrived at that conclusion? I represent a sample size of 1, but my SixTrack tasks seem to be running OK and are being (slowly) validated.

I run Xubuntu Linux 64-bit with kernel 4.8.0-58.

P.S. Maybe my sample size is 2. I have 2 machines running SixTrack without problems under Xubuntu Linux with the 64-bit kernel 4.8. You can check the details of my CPUs.
ID: 31392 · Report as offensive     Reply Quote
Trotador

Send message
Joined: 14 May 15
Posts: 17
Credit: 11,627,311
RAC: 0
Message 31393 - Posted: 14 Jul 2017, 21:51:58 UTC

I also have a host with 4.8 without a single errored wu.

So I think you have to look harder
ID: 31393 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31394 - Posted: 15 Jul 2017, 1:12:45 UTC - in response to Message 31392.  

Indeed my apologies; I see a very good record for this
host id 10398562

State: All (35) · In progress (3) · Validation pending (7) · Validation inconclusive (4) · Valid (19) · Invalid (0) · Error (2)
(the 2 in error are "no heartbeat)

I am afraid I am "throwing out the bay with the bathwater". Nonetheless we are having frequent errors from 4.8.0 on Intel Family 6. I shall publish some numbers soonest.
This combination of hardware and software is a source of errors but not always.

Eric.

I'm not sure how you arrived at that conclusion? I represent a sample size of 1, but my SixTrack tasks seem to be running OK and are being (slowly) validated.

I run Xubuntu Linux 64-bit with kernel 4.8.0-58.

P.S. Maybe my sample size is 2. I have 2 machines running SixTrack without problems under Xubuntu Linux with the 64-bit kernel 4.8. You can check the details of my CPUs.

ID: 31394 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31396 - Posted: 15 Jul 2017, 1:17:19 UTC - in response to Message 31393.  

Indeed, your results look impeccable, and a lot of them. Many thanks for
your contribution. I have noted your hostids and cleared the flag. Eric.


I also have a host with 4.8 without a single errored wu.

So I think you have to look harder

ID: 31396 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31397 - Posted: 15 Jul 2017, 1:34:20 UTC - in response to Message 31394.  

..... and I have cleared the flag. Eric

Indeed my apologies; I see a very good record for this
host id 10398562

State: All (35) · In progress (3) · Validation pending (7) · Validation inconclusive (4) · Valid (19) · Invalid (0) · Error (2)
(the 2 in error are "no heartbeat)

I am afraid I am "throwing out the bay with the bathwater". Nonetheless we are having frequent errors from 4.8.0 on Intel Family 6. I shall publish some numbers soonest.
This combination of hardware and software is a source of errors but not always.

Eric.

I'm not sure how you arrived at that conclusion? I represent a sample size of 1, but my SixTrack tasks seem to be running OK and are being (slowly) validated.

I run Xubuntu Linux 64-bit with kernel 4.8.0-58.

P.S. Maybe my sample size is 2. I have 2 machines running SixTrack without problems under Xubuntu Linux with the 64-bit kernel 4.8. You can check the details of my CPUs.

ID: 31397 · Report as offensive     Reply Quote
Paul Synopsis

Send message
Joined: 4 Jul 17
Posts: 1
Credit: 37,145
RAC: 0
Message 31399 - Posted: 15 Jul 2017, 7:36:39 UTC

Hello, Eric.
Have checked job, just yesterday I first started boinc on Freebsd OS 10.3 on the Celeron 1037U.
Three tasks on the check, part in anticipation.
There are 125 errors, I can guess why.
Ran boinc on freebsd there was no package with the libraries in linux. Error ELF.
Judging by the challenges and logs no problem.
Today I will go out of the city, will run on the laptop is Linux Ubuntu, have had no problems.
ID: 31399 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31402 - Posted: 15 Jul 2017, 12:25:33 UTC - in response to Message 31399.  

iGreat, many thanks. Eric.


Hello, Eric.
Have checked job, just yesterday I first started boinc on Freebsd OS 10.3 on the Celeron 1037U.
Three tasks on the check, part in anticipation.
There are 125 errors, I can guess why.
Ran boinc on freebsd there was no package with the libraries in linux. Error ELF.
Judging by the challenges and logs no problem.
Today I will go out of the city, will run on the laptop is Linux Ubuntu, have had no problems.

ID: 31402 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31403 - Posted: 15 Jul 2017, 12:34:28 UTC

IMPORTANT!

Apologies for the excessive ban. This is now corrected
but took more time than I had available yesterday.
The Prpject Management page at CERN is far too slow to be usable.
I had to use my own scripts to access the sixt_production database.

I have now "unbanned" 770 hosts but maintained the ban
for "banned" 73.
I have found

98707 all_linux Linux Hosts
4331 allnew48 Linux Hosts with Kernel 4.8.*
3161 allfamily6 Linux Hosts with Kernel 4.8.* and Intel Family 6 Processor(s)

Of the "banned" (max_results_day=-1) of which there are 1203,
843 are running Kernel 4.8.* on Intel Family 6.

Now I went to PCBE13978 (more disk space, and the validator logs)
and looked for all Invalids in the validator logs.
Then checked all 843 Hosts in the Invalids.
(Had to use nohup a lot as they are digging up the roads
and my Internet connection is being broken regularly,
or is it lxplus@CERN???)

Anyway, to cut a long story short. and I can't remember how to italicise or
emphasise with this interface :-(

I have found that 73 hosts account for 204,184 Invalid Results
==============================================
out of a Total of 258,725, i.e. almost 79% of all Invalids.
=========================================

No time to make a plot, but here are the Invalid counts for each of
the 73 Hosts.

39852 21733 20055 19813 19601 18485 7587
5425 5360 4848 4651 4266 4196 3791
2325 2293 1953 1825 1802 1789 1620
1535 1103 880 731 730 729 696
598 383 369 369 367 355 338
308 308 305 240 104 96 63
54 34 33 33 31 19 18
10 9 8 7 7 5 5
4 4 3 3 3 3 2
2 2 1 1 1 1 1
1 1 1

....and the HostIds in the same order....

10452223 10480022 10486162 10487841 10485156 10484503 10480909
10454365 10484659 10484606 10486251 10483458 10477752 10481907
10484752 10487436 10484663 10487212 10453783 10485912 10485911
10485913 10405110 10485905 10485907 10485906 10456121 10487210
10485908 10482829 10453149 10452598 10453254 10453494 10452614
10476277 10453157 10453507 10454458 10488834 10481344 10481733
10485179 10487938 10487900 10487190 10480804 10482592 10475984
10480775 10475982 10453730 10475983 10455704 10488196 10478598
10487688 10476101 10452585 10451971 10421428 10408937 10486716
10479782 10417991 10489602 10489459 10484733 10451832 10449556
10416774 10415082 10396588

Needless to say I shall be looking VERY closely at least the first
few of these 73 Hosts! Hint the 1st "englab" system has been banned
for a considerable time already :-)

However we have an even more urgent problem with
"Transient" errors and incorrect Validation.
I MUST look at that and write a report for Monday latest.
Eric. (Have to take a break. Too hot at the pool, and too sunny
to read my screen, and my battery is flat!)
ID: 31403 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 31404 - Posted: 15 Jul 2017, 12:46:49 UTC - in response to Message 31403.  

(Have to take a break. Too hot at the pool, and too sunny to read my screen, and my battery is flat!)

Life is hard! Here in London we have cooler weather after a couple of heatwaves. 16.6 C and a spot of rain at the moment.
Remember, I'm a FORTRAN/Fortran expert if you need some help. :-)
ID: 31404 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 31405 - Posted: 15 Jul 2017, 14:08:26 UTC - in response to Message 31403.  

Of the "banned" (max_results_day=-1) of which there are 1203,
843 are running Kernel 4.8.* on Intel Family 6.

Maybe due to the hyper-threading problem?
http://www.guru3d.com/news-story/debian-project-warns-turn-off-hyperthreading-with-skylake-and-kaby-lake.html
ID: 31405 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31406 - Posted: 15 Jul 2017, 14:12:47 UTC - in response to Message 31404.  

Well I never look a gift horse in the mouth, but I have
been using Fortran since 1964... Not up to speed yet on Fortran90/95
and one time I shall talk about Fortran 2003! where I eagerly await a proper implementation.
I don't have time right now, but I can provide some tests for input/output binary decimal
and decimal binary conversion (AND I must watch Roger at Wimbledon tomorrow!)

(Have to take a break. Too hot at the pool, and too sunny to read my screen, and my battery is flat!)

Life is hard! Here in London we have cooler weather after a couple of heatwaves. 16.6 C and a spot of rain at the moment.
Remember, I'm a FORTRAN/Fortran expert if you need some help. :-)

ID: 31406 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,501,728
RAC: 4,157
Message 31407 - Posted: 15 Jul 2017, 14:17:43 UTC - in response to Message 31404.  
Last modified: 15 Jul 2017, 14:35:19 UTC

(Have to take a break. Too hot at the pool, and too sunny to read my screen, and my battery is flat!)

Life is hard! Here in London we have cooler weather after a couple of heatwaves. 16.6 C and a spot of rain at the moment.
Remember, I'm a FORTRAN/Fortran expert if you need some help. :-)


Here in the Great NW PDT we are in that typical month when it is sunny every day and at times too hot for me (but once Fall gets here it will rain until next June)

I do have a pond and creek on the property but I haven't had my laptop out there since T4T started and I for some reason have to have every computer running Cern tasks 24/7 so it has been sitting here in the house plugged into the AC for just over 5 YEARS 24/7 and the only break it gets is when the power goes out in a Winter storm. (my only laptop ever so I must have picked the right one)

Have a fine weekend Eric and Ivan (I'll email you some sunshine if you need it Ivan)

(oh and IBM Fortran is 15 months older than me)

The next revision of the language (Fortran 2015) is intended to be a minor revision and is planned for release in mid-2018.
Volunteer Mad Scientist For Life
ID: 31407 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31408 - Posted: 15 Jul 2017, 14:38:49 UTC - in response to Message 31405.  

This looks very very interesting, I'll see if I can get some feedback from experts.
Thanks a lot. Eric.

Of the "banned" (max_results_day=-1) of which there are 1203,
843 are running Kernel 4.8.* on Intel Family 6.

Maybe due to the hyper-threading problem?
http://www.guru3d.com/news-story/debian-project-warns-turn-off-hyperthreading-with-skylake-and-kaby-lake.html

ID: 31408 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 31409 - Posted: 15 Jul 2017, 17:49:48 UTC - in response to Message 31406.  

Well I never look a gift horse in the mouth, but I have
been using Fortran since 1964... Not up to speed yet on Fortran90/95
and one time I shall talk about Fortran 2003! where I eagerly await a proper implementation.
OK, you beat me by 5 or 6 years on FORTRAN, but I've done a fair bit of 90/95. Not so much 2003 I guess, and I've never actually tried co-arrays yet. I presume you are aware of it, but the Usenet newsgroup comp.lang.fortran can be a great resource.

I don't have time right now, but I can provide some tests for input/output binary decimal
and decimal binary conversion (AND I must watch Roger at Wimbledon tomorrow!)

(Have to take a break. Too hot at the pool, and too sunny to read my screen, and my battery is flat!)

Life is hard! Here in London we have cooler weather after a couple of heatwaves. 16.6 C and a spot of rain at the moment.
Remember, I'm a FORTRAN/Fortran expert if you need some help. :-)

ID: 31409 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 31411 - Posted: 15 Jul 2017, 17:52:39 UTC - in response to Message 31403.  

Needless to say I shall be looking VERY closely at least the first
few of these 73 Hosts!

The first 7 hosts:
HostID	CPU type									Cores	Threads	Linux			HT
10452223 Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz [Family 6 Model 158 Stepping 9]  	4	4	4.8.0-54-generic	x
10480022 Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz [Family 6 Model 94 Stepping 3]  	4	8	4.8.0-54-generic	ON
10486162 Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz [Family 6 Model 158 Stepping 9]	4	4	4.8.0-56-generic	x
10487841 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz [Family 6 Model 94 Stepping 3]	4	8	4.8.0-58-generic	ON
10485156 Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz [Family 6 Model 158 Stepping 9]	4	4	4.8.0-54-generic	x
10484503 Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9]  	4	8	4.8.0-54-generic	ON
10480909 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz [Family 6 Model 94 Stepping 3]  	4	8	4.8.0-56-generic	ON
ID: 31411 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 31414 - Posted: 15 Jul 2017, 19:21:45 UTC

HostID not on your list, but seems to me also not very reliable:

10489186 AMD Ryzen 7 1800X Eight-Core Processor [Family 23 Model 1 Stepping 1] Cores 8 Threads 16 OS Linux 4.8.0-58-generic HT ON
ID: 31414 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31421 - Posted: 15 Jul 2017, 21:40:42 UTC - in response to Message 31414.  

Well thanks; I shall check it out. I see at least one result and I don't believe
this is a SixTrack problem. Late though, maybe only tomorrow.
It is another 4.8.* Linux. I only looked at Intel Family 6. I guess I
shall also have too look at AMDs....now I know how to do it :-)
(Latest news suggests a hyper-threading problem.) . Eric.

w-c2_n10_lhc2016_40_MD-140-16-476-2.5-1.2077__32__s__64.31_59.32__7_8__6__72_1_sixvf_boinc7010_2_0
0.000000
102400.000000
d41d8cd98f00b204e9800998ecf8427e
http://lhcathomeclassic.cern.ch/sixtrack_cgi/file_upload_handler

stderr out
7.6.31

process exited with code 193 (0xc1, -63)


SIGSEGV: segmentation violation
Stack trace (5 frames):
[0x8266bed]
[0xf7727cb0]
[0x825acd7]
[0x8253b2f]
[0x81fb4f8]

Exiting...


]]>
HostID not on your list, but seems to me also not very reliable:

10489186 AMD Ryzen 7 1800X Eight-Core Processor [Family 23 Model 1 Stepping 1] Cores 8 Threads 16 OS Linux 4.8.0-58-generic HT ON

ID: 31421 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 31422 - Posted: 15 Jul 2017, 21:47:42 UTC - in response to Message 31414.  

OK, maybe my list is incomplete. This host is already banned though. Eric.

HostID not on your list, but seems to me also not very reliable:
10489186 AMD Ryzen 7 1800X Eight-Core Processor [Family 23 Model 1 Stepping 1] Cores 8 Threads 16 OS Linux 4.8.0-58-generic HT ON

ID: 31422 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 31448 - Posted: 17 Jul 2017, 7:19:22 UTC - in response to Message 31422.  

OK, maybe my list is incomplete. This host is already banned though. Eric.


This one is not (yet) banned, but seems not trustful: Host 9841071

An AMD Phenom II with OS Linux 4.4.0-83-generic?! Not 4.8

Results:
State: All (1118) · In progress (0) · Validation pending (452) · Validation inconclusive (373) · Valid (1) · Invalid (125) · Error (167)
ID: 31448 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : No RESULTS accepted from Linux Kernel 4.8.*


©2024 CERN