Message boards : Number crunching : fubar host of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36221 - Posted: 3 Aug 2018, 1:53:29 UTC - in response to Message 36215.  


For the record I am not accusing you of cheating, I do not know either way if you are or not.
My concern is that the project is happily marking results as valid with a high credit score when I don't believe they should be valid.


Well I'm not. I wouldn't even know how to. As I said I attached the project and thats it. I just recently noticed it had so many invalids, did a reset project, it didn't help, and then I started checking these forums at which point I found this thread.

These other hosts that are doing the same thing I doubt are cheating either.....I'm thinking the code is just buggy. I know I've returned a lot of HITS files since I was affected by that upload/download issue a couple weeks ago and I even posted in that thread. I don't know what the difference between the jobs that are marked valid that return HITs files, those that are marked valid that return nothing, and those that are invalid. Maybe a admin can figure out what the issue is but I wouldn't blame those other hosts of anything malicious until the code is verified to be ok.
ID: 36221 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,201,701
RAC: 10,611
Message 36223 - Posted: 3 Aug 2018, 3:39:53 UTC - in response to Message 36221.  

.....I'm thinking the code is just buggy.
It works almost flawless for so many that the buggy code argument just doesn't hold any water. Invalids and no HITters are more likely caused by the host being overloaded and/or not enough RAM reserved. Cut back on the number of simultaneous ATLAS tasks running and maybe add an app_config.xml and you will likely see your invalids drop to near 0 and close to 100% returning HITS files.

I wouldn't blame those other hosts of anything malicious until the code is verified to be ok.
Againm,. the fact that it runs near flawless for so many on a variety of platforms pretty much proves the code is reliable. The fact that so many who used to have problems fixed those problems simply by configuring their end properly again suggests the code is reliable. Not saying it's perfect, saying if you configure properly it will run near flawless for you.
ID: 36223 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,387,434
RAC: 39,600
Message 36224 - Posted: 3 Aug 2018, 3:51:54 UTC

Gridcoin is a problem, also in Project Collatz. One User of them had cheated this project last month.
It is only important to get Cobblestones. Not more and not less. There is no interest for science!
There is the same discussion since them.
ID: 36224 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,676,700
RAC: 100,593
Message 36225 - Posted: 3 Aug 2018, 5:08:59 UTC

@vseven

The main reason why I accused you cheating is the fact that you ask for help but do everything to hide relevant information.
It's the logfiles that give other users and admins the necessary hints for qualified help.

Beside that the discussion about those short running tasks that deliver errors but are validated and rewarded are ongoing for weeks.



But it seems that the measures that have been activated lately by David Cameron are doing a good job.
ID: 36225 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 282
Credit: 8,896,968
RAC: 7,737
Message 36227 - Posted: 3 Aug 2018, 8:38:10 UTC

As noted on this thread on the ATLAS boards, the validation logic has been tightened so that credit is not allocated for these short unsuccessful tasks.

I'm not sure why such huge credits were being given to these tasks (credit logic has always been a mystery to me), in the past I don't think it was so large. Please post any examples of bad tasks getting credits which finished after 1 August when the validation was changed.
ID: 36227 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36228 - Posted: 3 Aug 2018, 9:21:56 UTC - in response to Message 36227.  

As noted on this thread on the ATLAS boards, the validation logic has been tightened so that credit is not allocated for these short unsuccessful tasks.

I'm not sure why such huge credits were being given to these tasks (credit logic has always been a mystery to me), in the past I don't think it was so large. Please post any examples of bad tasks getting credits which finished after 1 August when the validation was changed.

Hi David,

So this host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522402&offset=0&show_names=0&state=4&appid= has valid tasks (scroll past the first page or two) that were returned 2nd August. Approximately 30 minutes of run time for approximately 60 credits, ie 120 credits per hour, CPU time was a lot less.

Are these valid tasks receiving the correct credit ?
ID: 36228 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 850
Credit: 37,494,242
RAC: 26,690
Message 36230 - Posted: 3 Aug 2018, 10:10:59 UTC - in response to Message 36228.  

That is quite the machine there.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402

It is running those tasks with 8 cores each
ID: 36230 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36233 - Posted: 3 Aug 2018, 12:23:52 UTC - in response to Message 36230.  
Last modified: 3 Aug 2018, 12:36:30 UTC

The main reason why I accused you cheating is the fact that you ask for help but do everything to hide relevant information.


.....and I posted links to example tasks. Maybe that's considering "doing everything to hide relevant information" :rolleyes:

As a side note I've also crunched SixTrack WU's, hundreds of them since I can run 40 at a time. 100% valid results with both hosts. These hosts were doing Universe@Home tasks for two months before switching to LHC. 100% valid results from there also. They also did SRBase for a while. 100% valid from that project. Hence me saying something is buggy in Atlas.

That is quite the machine there.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402

It is running those tasks with 8 cores each


Correct...its one of mine. A dual Xeon E5-2650 v3 @ 2.3Ghz with 128Gb of Ram and SSD hard drives. Hyperthreading is turned on so it shows as 40 threads. So it runs five 8 CPU tasks at a time. My "problem" machine is a dual Xeon X5670 @ 2.93 with 64Gb of ram and SSD hard drives. Hyperthreading is also turned on so it shows as 24 threads. It runs three 8 CPU tasks at a time but apparently not very well...
ID: 36233 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,387,434
RAC: 39,600
Message 36234 - Posted: 3 Aug 2018, 12:25:36 UTC - in response to Message 36212.  
Last modified: 3 Aug 2018, 12:27:05 UTC

So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each.

Would an Admin like to confirm that they are valid results please ?
I see the VMs have 8 CPUs assigned but the run time is barely enough to spin up the VM let alone do any valid work.


This are sandbox installation from docker in Virtualbox:
2018-08-01 05:54:27 (4100): Detected: Sandbox Configuration Enabled

vseven, and you don't know about this installation??
ID: 36234 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36235 - Posted: 3 Aug 2018, 12:38:53 UTC - in response to Message 36234.  
Last modified: 3 Aug 2018, 12:43:05 UTC

So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each.

Would an Admin like to confirm that they are valid results please ?
I see the VMs have 8 CPUs assigned but the run time is barely enough to spin up the VM let alone do any valid work.


This are sandbox installation from docker in Virtualbox:
2018-08-01 05:54:27 (4100): Detected: Sandbox Configuration Enabled

vseven, and you don't know about this installation??


I'm not sure what you mean. I have BOINC 7.10.2 (x64) and VirtualBox 5.2.16 installed. Beyond that neither host has been modified in any way. And again as said I have tried a reset project on both, watched a fresh VDI file get downloaded, and both hosts run the exact same way.

I have tried older version of VB with no change (5.1.*, 5.2.8, etc). If there is something I can try to test I will but this is a vanilla install with no app_config or any other modifications.

Both hosts are Server 2012 R2 if that matters. I posted the actual specs a couple posts up.
ID: 36235 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,387,434
RAC: 39,600
Message 36236 - Posted: 3 Aug 2018, 12:49:41 UTC - in response to Message 36203.  

As mentioned in another thread you may consider to make your logs visible for other volunteers.
This would make it easier to give a qualified answer.

To do so, navigate to https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project and check the box near "Should LHC@home show your computers on its web site?".

Please, if it is possible for 1 or 2 days, open your Computer-list for checking!
ID: 36236 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 732
Credit: 27,387,434
RAC: 39,600
Message 36237 - Posted: 3 Aug 2018, 13:53:00 UTC - in response to Message 36234.  
Last modified: 3 Aug 2018, 13:55:52 UTC

So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each.

Would an Admin like to confirm that they are valid results please ?
I see the VMs have 8 CPUs assigned but the run time is barely enough to spin up the VM let alone do any valid work.


This are sandbox installation from docker in Virtualbox:
2018-08-01 05:54:27 (4100): Detected: Sandbox Configuration Enabled

vseven, and you don't know about this installation??

This Computer 10522400 run not because of a wrong configuration with sandbox.

You need a clean Installation of Virtualbox with a reboot after deinstallation and a reboot after installation.

If sandbox is than always avalaible, don't know a answer.

Edit: Do you have a second Hypervisor installed?
ID: 36237 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36238 - Posted: 3 Aug 2018, 14:29:41 UTC - in response to Message 36237.  
Last modified: 3 Aug 2018, 14:31:36 UTC

This Computer 10522400 run not because of a wrong configuration with sandbox.

You need a clean Installation of Virtualbox with a reboot after deinstallation and a reboot after installation.

If sandbox is than always avalaible, don't know a answer.

Edit: Do you have a second Hypervisor installed?


Ok, here is what I did, which is things I've done before but did it again anyway:

- Shut down BOINC and stopped running tasks. Waited for task manager to show all "VirtualBox Interface" processes closed and all BOINC processes gone
- Uninstalled VirtualBox 5.2.16 through add/remove programs. (Side note: the only programs installed on this host are BOINC, Microsoft Silverlight, and VirtualBox)
- Rebooted the server
- BOINC started (service install) and all tasks reported "Postponed: Detection of VM Hypervisor failed. (8 CPUs)" which is correct since VB is gone.
- Shut down BOINC again.
- Cleaned up the machine just to make sure. (Deleted all temp files, deleted remains of c:\Program Files\Oracle, deleted HKLM\Software\Oracle registry key, etc)
- Installed VirtualBox 5.2.16. All default settings (full install).
- Installed VB Extensions Pack 5.2.16
- Rebooted server

BOINC came back up and is chugging along at the same pace as before. 8 CPU WU's are taking 12 - 14 minutes each. Here is one that was waiting to run, switched to running (after the reinstall), and finished in 13:10: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203538752 It is marked as a validate error.

This server was a former HyperV host but has been retired. So yes it did have HyperV role installed at one point. But I've removed that role. In fact it has 0 roles installed (other then Storage Services which is required and cannot be removed).
ID: 36238 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36239 - Posted: 3 Aug 2018, 14:35:42 UTC
Last modified: 3 Aug 2018, 14:57:03 UTC

I just did another "Reset Project" on it. Its downloading everything fresh again and I'll see if that makes a difference: https://imgur.com/PNvB9sD

If it doesn't then I don't know what else to say other then the code is buggy. Unless I need to switch to older versions of VB and BOINC.

Edit: Surprise surprise....exact same results. Example WU: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99747354 I like that this one has "Error while computing" on two other hosts also....

I'm suspending this host and will switch back to Universe@home until someone can figure it out.
ID: 36239 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36240 - Posted: 3 Aug 2018, 14:48:21 UTC - in response to Message 36239.  

Is Boinc a Service Install on the other one that works ?
ID: 36240 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36242 - Posted: 3 Aug 2018, 15:00:19 UTC - in response to Message 36240.  
Last modified: 3 Aug 2018, 15:08:03 UTC

Is Boinc a Service Install on the other one that works ?


Yup. I mean I can uninstall that too and reinstall not as a service but this is the only project I'm having a issue with so I wouldn't imagine that's it.

Edit: Here is another work unit that I just did that's listed as invalid: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99773436 Now look at the other computer that also crunched it. Its also invalid, it also has about the same time on it. This can't be a issue with just my host...
ID: 36242 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36244 - Posted: 3 Aug 2018, 15:07:17 UTC - in response to Message 36242.  

No, if the other is working fine as a service install it won't be that.

Atlas uses Vbox, have you run any other projects that use it ?
ID: 36244 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36245 - Posted: 3 Aug 2018, 15:11:13 UTC - in response to Message 36244.  

No, if the other is working fine as a service install it won't be that.

Atlas uses Vbox, have you run any other projects that use it ?


No. And see my edit above.

Also just some testing to make sure its not my host. I downloaded some yafu project tasks, 24 CPU WU's (YAFU for small composites 134.05). It crunched using all 24 threads, finished, and validated successfully. I know its apples to oranges because of VB but there is nothing fundamentally wrong with my machine itself.
ID: 36245 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36246 - Posted: 3 Aug 2018, 15:28:29 UTC - in response to Message 36242.  

Edit: Here is another work unit that I just did that's listed as invalid: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99773436 Now look at the other computer that also crunched it. Its also invalid, it also has about the same time on it. This can't be a issue with just my host...

That host doesn't have a normal valid Atlas task in its history.
It does have some that were invalid due to things like:

2018-07-30 23:02:02 (6056): Guest Log: mv: cannot stat `metadata-*.xml': No such file or directory
2018-07-30 23:02:02 (6056): Guest Log: ERROR: Missing metadata.xml

I haven't run Atlas for quite some time so am probably well behind on what it does now so am going to step away and let those more familiar with it try and figure out what is wrong.

Good luck !
ID: 36246 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36253 - Posted: 3 Aug 2018, 17:23:18 UTC - in response to Message 36244.  

Atlas uses Vbox, have you run any other projects that use it ?


Since you said this I started thinking back and I did run Cosmology@Home which uses VBox. So I attached to that project again, it downloaded its vdi files and wrapper, and it grabbed a bunch of 24 CPU task. Gonna let those run over the weekend and see what happens.
ID: 36253 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : fubar host of the day


©2019 CERN