Message boards : Number crunching : fubar host of the day
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,658,480
RAC: 105,312
Message 38976 - Posted: 26 May 2019, 7:19:13 UTC

Can anybody explain how an AMD Ryzen 7 1700 can get this benchmark values?


https://lhcathome.cern.ch/lhcathome/show_user.php?userid=442124
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10579909
Measured floating point speed: 9.27 billion ops/sec
Measured integer speed: 96.27 billion ops/sec



https://lhcathome.cern.ch/lhcathome/show_user.php?userid=94014
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10389189
Measured floating point speed: 5.14 billion ops/sec
Measured integer speed: 59.23 billion ops/sec
ID: 38976 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,724,049
RAC: 21,913
Message 38977 - Posted: 26 May 2019, 8:07:27 UTC - in response to Message 38976.  

Can anybody explain how an AMD Ryzen 7 1700 can get this benchmark values?

You don't have to look far to find discrepancies. Here are the values for my Haswell machines:

i7-4770:
Measured floating point speed 4.57 billion ops/sec
Measured integer speed 117.55 billion ops/sec

i7-4790:
Measured floating point speed 4.49 billion ops/sec
Measured integer speed 59.64 billion ops/sec

The usual explanation (probably on the BOINC forum, or maybe Einstein) is that BOINC gives notoriously unreliable measurements. I see no reason to doubt it.
ID: 38977 · Report as offensive     Reply Quote
Richie_unstable

Send message
Joined: 26 Oct 18
Posts: 33
Credit: 778,722
RAC: 25
Message 38978 - Posted: 26 May 2019, 9:19:33 UTC - in response to Message 38977.  
Last modified: 26 May 2019, 9:21:56 UTC

The usual explanation (probably on the BOINC forum, or maybe Einstein) is that BOINC gives notoriously unreliable measurements. I see no reason to doubt it.


I've seen on couple of my dual boot hosts (Win / Linux) that Linux flavor and/or kernel versions can change those numbers quite a bit. Also different numbers of allowed CPUs while running Boinc benchmarks would change those values. Some time ago I run a few speed tests while I was updating Linux versions on my hosts. These hosts have almost identical hardware, settings and clock speeds (biggest difference is perhaps in their RAM configuration). Windows benchmark results ended up very close to each other, but results in Linux were twisted.

All benchmarks were run with 4 cpus of 12 allowed.

Host 1:

Windows 10
CPU-Z: single 356.7 / multi 1398.6
Boinc: 4878 / 12217

Linux Mint 19.1 Tessa (kernel 5.1-rc4)
Boinc: 4873 / 149970

Host 2:

Windows 10
CPU-Z: single 358.1 / multi 1444.8
Boinc: 4897 / 12197

Linux (Ubuntu 18.10 , kernel 4.18.0-17)
Boinc: 5352 / 105679

Linux (Ubuntu 19.04 , kernel 5.0.0-13)
Boinc: 5720 / 101207
ID: 38978 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,724,049
RAC: 21,913
Message 38979 - Posted: 26 May 2019, 13:28:21 UTC - in response to Message 38978.  
Last modified: 26 May 2019, 13:33:00 UTC

I've seen on couple of my dual boot hosts (Win / Linux) that Linux flavor and/or kernel versions can change those numbers quite a bit. Also different numbers of allowed CPUs while running Boinc benchmarks would change those values.

It is quite possible that Windows is more consistent than Linux.

My i7-4770 is on Ubuntu 18.04, while my i7-4790 is on Ubuntu 16.04.
But they are both on the same Linux kernel (4.15.0-48), though I wonder if it has been updated since the benchmarks were run?
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10594847
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10588304

I allow all cores to run BOINC on the 4770, but I do reserve one core to support a GPU on Folding on the 4790. They are on different projects, if that matters, but BOINC should suspend them to run the benchmarks. I think BOINC just uses a random number generator.
ID: 38979 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 849
Credit: 37,482,257
RAC: 27,340
Message 39048 - Posted: 5 Jun 2019, 6:54:47 UTC
Last modified: 5 Jun 2019, 6:58:00 UTC

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10455317&offset=0&show_names=0&state=4&appid=13

Maybe we all should switch back to VirtualBox 5.2.16 and run single core tasks (and Required extension pack not installed, remote desktop not enabled)

Maybe it will work with Atlas and CMS of course it won't but maybe a 100,000 credit Sixtrack tasks will be next

And the exact opposite is still running here https://lhcathome.cern.ch/lhcathome/results.php?hostid=10519938&offset=0&show_names=0&state=0&appid=14
ID: 39048 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1125
Credit: 21,656,646
RAC: 34,561
Message 39049 - Posted: 5 Jun 2019, 12:42:16 UTC - in response to Message 39048.  

... and Required extension pack not installed ...
why so?
ID: 39049 · Report as offensive     Reply Quote
MaBoinc

Send message
Joined: 23 Mar 18
Posts: 2
Credit: 29,425,191
RAC: 7,460
Message 39050 - Posted: 5 Jun 2019, 13:32:06 UTC - in response to Message 39048.  
Last modified: 5 Jun 2019, 13:32:39 UTC

Maybe we all should switch back to VirtualBox 5.2.16 and run single core tasks


If only that was just the solution, I've spend some time trying to figure it out but I honestly don't have a clue what is going on. Some observations though:

This host runs VB 6.0.8 and gets 23k for one task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=230825272

I kinda hoped it would indeed be a VB thing, so I installed 6.0.8 and tried, still getting only 500-700 credits per task with similar run times on a very similar system (1950x instead of a 2950x):
https://lhcathome.cern.ch/lhcathome/result.php?resultid=230936322

Given the similar run times there shouldn't be that much of a difference, not consistently at least. The only real difference is the measured Device peak FLOPS, explained here: https://boinc.berkeley.edu/trac/wiki/CreditNew

This has been the case since LHC switched credit systems, explained by Laurence here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5021&postid=38790#38790

And indeed, 680.34/4.51*152.78=23047, so the difference in device peak flops indeed accounts for the difference. While the old system was far from perfect, this system seems to be even worse given how inconsistent it is.
ID: 39050 · Report as offensive     Reply Quote
Guiri-One[Andalucia]

Send message
Joined: 1 Feb 06
Posts: 43
Credit: 9,723
RAC: 0
Message 39051 - Posted: 5 Jun 2019, 14:02:18 UTC - in response to Message 39050.  

Dont complain, is forbidden here :)
ID: 39051 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,074,619
RAC: 535,955
Message 39054 - Posted: 5 Jun 2019, 17:36:48 UTC - in response to Message 39050.  

I run the old virtualbox and single core on my machines, I find it to more reliable.
ID: 39054 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,724,049
RAC: 21,913
Message 39057 - Posted: 5 Jun 2019, 20:24:23 UTC
Last modified: 5 Jun 2019, 20:24:57 UTC

CMS is doing quite well on all 8 cores of my i7-3770 with VirtualBox 5.1.38 (Ubuntu 16.04).
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10590338
EDIT: It helps to have 32 GB of memory.

I do native ATLAS and Theory, and don't know about them.
ID: 39057 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,658,480
RAC: 105,312
Message 39157 - Posted: 20 Jun 2019, 8:15:00 UTC
Last modified: 20 Jun 2019, 8:15:28 UTC

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10546354
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10596058

Looks like on both hosts a script cancels Theory tasks after 3 h runtime.
This always causes the currently running subtask to get lost.

Not helpful for the project.
ID: 39157 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 233
Credit: 10,725,376
RAC: 4,088
Message 39384 - Posted: 18 Jul 2019, 16:50:58 UTC
Last modified: 18 Jul 2019, 17:30:56 UTC

Self reporting 10457490 which didn't like the VBox 6.0.10 upgrade (which the 2 other hosts had no problem with and are running fine). It also didn't like the downgrade back to 6.0.8. A series of BSODs of Storage Exception, System Service Exception, numerous restarts, checked Hyper-V disabled, Bios checked for Virtualisation Enabled, and finally Critical Process Died resulting in Boot Failure loops. Original installation disks are long gone so had to download and create installation media on another host, which wouldn't let me overwrite the (presumably) corrupted windows files (or maybe the HDD is past its best) so I'm currently waiting for it to do the full install which is a right royal pain. And then Bionc, VBox etc.
Oh, joy. Several/many hours of my time that I'll never get back but luckily I have only used this host for Boinc and hopefully it might even be faster as it should clear out all the rubbish my sister left on it.
ID: 39384 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,194,608
RAC: 10,573
Message 39388 - Posted: 18 Jul 2019, 18:07:56 UTC - in response to Message 39384.  

Oh, joy. Several/many hours of my time that I'll never get back but luckily I have only used this host for Boinc and hopefully it might even be faster as it should clear out all the rubbish my sister left on it.

All that work so you can run 1 single core Theory VBox and maybe a Sixtrack simultaneously. That's dedication to the cause.
With just a little more work you could put Linux with a small desktop GUI on that machine and run a single core (2 athena threads) ATLAS plus a (Theory or Sixtrack), like I do on my old X2 with 4 GB. Yes, a bit of a learning curve but you could learn enough in a day to get you up and crunching. Anything to get rid of VBox. It's not just the overhead it's the ongoing PITA.
ID: 39388 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 233
Credit: 10,725,376
RAC: 4,088
Message 39391 - Posted: 18 Jul 2019, 21:39:58 UTC

Hi Bronco
It HAD been running 2 x Single-core Theory tasks on that host quite happily (all my hosts just run singles as I have found that to be most efficient) but it now crashes each time I start a VM, even after all the reinstallation so there's still something not right (something wrong when VM dials out Luckily there is plenty Sixtrack work just now so it can get on with doing that until I have the time (and inclination) to dig further and sort out whatever's wrong.

I tried Ubuntu (running within a VM with the added complication of nested VMs within a VM) ages ago, before the T4T/vLHC/LHC Grand Unification, when we were testing the Christmas Challenge, with some success but was never really comfortable with all the Command Line stuff. If I can't solve the issue, I'll see if I can find the most Windowsy distro then I can join in playing with the Native apps.
ID: 39391 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 849
Credit: 37,482,257
RAC: 27,340
Message 39628 - Posted: 17 Aug 2019, 14:28:57 UTC

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10519938

This one must run itself like HAL 9000
It has been doing this for a long time.
ID: 39628 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 739
Credit: 6,027,121
RAC: 1,035
Message 39629 - Posted: 17 Aug 2019, 15:29:37 UTC - in response to Message 39628.  
Last modified: 17 Aug 2019, 17:29:07 UTC

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10519938

This one must run itself like HAL 9000
It has been doing this for a long time.

That host is already down to a max of 1 Theory Native task per day.
To go down to 1 ATLAS task per day it has to crunch another 306 (errors).
ID: 39629 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 849
Credit: 37,482,257
RAC: 27,340
Message 39632 - Posted: 18 Aug 2019, 5:03:08 UTC - in response to Message 39629.  

Yeah I figured that had to be why it was only one task at a time CP but I know it has been doing this since before March so imagine how many 1000's of these have been wasted and........as I mentioned before have to wait 2 weeks before they are resent to hopefully a pc that has a human checking once in a while and noticing thousands of errors and not ONE single Valid for over 6 months........or is there some award for the most computer error tasks in a row?

Seems to me that the server should finally stop sending to that host after the first 1000 computer errors and zero Valids.
And I imagine this is not the only one doing this.

Maybe they should switch to seti
ID: 39632 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,358,873
RAC: 39,307
Message 39748 - Posted: 29 Aug 2019, 15:39:17 UTC

ID: 39748 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,658,480
RAC: 105,312
Message 40165 - Posted: 16 Oct 2019, 5:24:00 UTC

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10617647
Attached a week ago and within this week more than 400 Theory native at 100% error rate because CVMFS is not installed:
22:25:43 CEST +02:00 2019-10-14: cranky-0.0.29: [INFO] Checking CVMFS.
22:25:43 CEST +02:00 2019-10-14: cranky-0.0.29: [ERROR] 'which' could not locate the command 'cvmfs_config'.
ID: 40165 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 849
Credit: 37,482,257
RAC: 27,340
Message 40169 - Posted: 16 Oct 2019, 12:53:15 UTC - in response to Message 40165.  

That pc is one we haven't seen here for close to 10 years

1 -core with less than 2GB ram
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10617647

And I think it has a Boinc Version from a couple years ago.

https://cernvm.cern.ch/portal/filesystem/downloads

what am I doing up at 5:50am
ID: 40169 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : fubar host of the day


©2019 CERN