Message boards :
Number crunching :
most unpolite host of the day
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
Well I'm not. I wouldn't even know how to. As I said I attached the project and thats it. I just recently noticed it had so many invalids, did a reset project, it didn't help, and then I started checking these forums at which point I found this thread. These other hosts that are doing the same thing I doubt are cheating either.....I'm thinking the code is just buggy. I know I've returned a lot of HITS files since I was affected by that upload/download issue a couple weeks ago and I even posted in that thread. I don't know what the difference between the jobs that are marked valid that return HITs files, those that are marked valid that return nothing, and those that are invalid. Maybe a admin can figure out what the issue is but I wouldn't blame those other hosts of anything malicious until the code is verified to be ok. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
.....I'm thinking the code is just buggy.It works almost flawless for so many that the buggy code argument just doesn't hold any water. Invalids and no HITters are more likely caused by the host being overloaded and/or not enough RAM reserved. Cut back on the number of simultaneous ATLAS tasks running and maybe add an app_config.xml and you will likely see your invalids drop to near 0 and close to 100% returning HITS files. I wouldn't blame those other hosts of anything malicious until the code is verified to be ok.Againm,. the fact that it runs near flawless for so many on a variety of platforms pretty much proves the code is reliable. The fact that so many who used to have problems fixed those problems simply by configuring their end properly again suggests the code is reliable. Not saying it's perfect, saying if you configure properly it will run near flawless for you. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
Gridcoin is a problem, also in Project Collatz. One User of them had cheated this project last month. It is only important to get Cobblestones. Not more and not less. There is no interest for science! There is the same discussion since them. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,915,653 RAC: 128,265 |
@vseven The main reason why I accused you cheating is the fact that you ask for help but do everything to hide relevant information. It's the logfiles that give other users and admins the necessary hints for qualified help. Beside that the discussion about those short running tasks that deliver errors but are validated and rewarded are ongoing for weeks. But it seems that the measures that have been activated lately by David Cameron are doing a good job. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
As noted on this thread on the ATLAS boards, the validation logic has been tightened so that credit is not allocated for these short unsuccessful tasks. I'm not sure why such huge credits were being given to these tasks (credit logic has always been a mystery to me), in the past I don't think it was so large. Please post any examples of bad tasks getting credits which finished after 1 August when the validation was changed. |
Send message Joined: 7 Aug 14 Posts: 23 Credit: 9,962,408 RAC: 15,239 |
As noted on this thread on the ATLAS boards, the validation logic has been tightened so that credit is not allocated for these short unsuccessful tasks. Hi David, So this host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522402&offset=0&show_names=0&state=4&appid= has valid tasks (scroll past the first page or two) that were returned 2nd August. Approximately 30 minutes of run time for approximately 60 credits, ie 120 credits per hour, CPU time was a lot less. Are these valid tasks receiving the correct credit ? |
Send message Joined: 24 Oct 04 Posts: 1169 Credit: 54,079,358 RAC: 51,688 |
That is quite the machine there. https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402 It is running those tasks with 8 cores each |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
The main reason why I accused you cheating is the fact that you ask for help but do everything to hide relevant information. .....and I posted links to example tasks. Maybe that's considering "doing everything to hide relevant information" :rolleyes: As a side note I've also crunched SixTrack WU's, hundreds of them since I can run 40 at a time. 100% valid results with both hosts. These hosts were doing Universe@Home tasks for two months before switching to LHC. 100% valid results from there also. They also did SRBase for a while. 100% valid from that project. Hence me saying something is buggy in Atlas. That is quite the machine there. Correct...its one of mine. A dual Xeon E5-2650 v3 @ 2.3Ghz with 128Gb of Ram and SSD hard drives. Hyperthreading is turned on so it shows as 40 threads. So it runs five 8 CPU tasks at a time. My "problem" machine is a dual Xeon X5670 @ 2.93 with 64Gb of ram and SSD hard drives. Hyperthreading is also turned on so it shows as 24 threads. It runs three 8 CPU tasks at a time but apparently not very well... |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each. This are sandbox installation from docker in Virtualbox: 2018-08-01 05:54:27 (4100): Detected: Sandbox Configuration Enabled vseven, and you don't know about this installation?? |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each. I'm not sure what you mean. I have BOINC 7.10.2 (x64) and VirtualBox 5.2.16 installed. Beyond that neither host has been modified in any way. And again as said I have tried a reset project on both, watched a fresh VDI file get downloaded, and both hosts run the exact same way. I have tried older version of VB with no change (5.1.*, 5.2.8, etc). If there is something I can try to test I will but this is a vanilla install with no app_config or any other modifications. Both hosts are Server 2012 R2 if that matters. I posted the actual specs a couple posts up. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
As mentioned in another thread you may consider to make your logs visible for other volunteers. Please, if it is possible for 1 or 2 days, open your Computer-list for checking! |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each. This Computer 10522400 run not because of a wrong configuration with sandbox. You need a clean Installation of Virtualbox with a reboot after deinstallation and a reboot after installation. If sandbox is than always avalaible, don't know a answer. Edit: Do you have a second Hypervisor installed? |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
This Computer 10522400 run not because of a wrong configuration with sandbox. Ok, here is what I did, which is things I've done before but did it again anyway: - Shut down BOINC and stopped running tasks. Waited for task manager to show all "VirtualBox Interface" processes closed and all BOINC processes gone - Uninstalled VirtualBox 5.2.16 through add/remove programs. (Side note: the only programs installed on this host are BOINC, Microsoft Silverlight, and VirtualBox) - Rebooted the server - BOINC started (service install) and all tasks reported "Postponed: Detection of VM Hypervisor failed. (8 CPUs)" which is correct since VB is gone. - Shut down BOINC again. - Cleaned up the machine just to make sure. (Deleted all temp files, deleted remains of c:\Program Files\Oracle, deleted HKLM\Software\Oracle registry key, etc) - Installed VirtualBox 5.2.16. All default settings (full install). - Installed VB Extensions Pack 5.2.16 - Rebooted server BOINC came back up and is chugging along at the same pace as before. 8 CPU WU's are taking 12 - 14 minutes each. Here is one that was waiting to run, switched to running (after the reinstall), and finished in 13:10: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203538752 It is marked as a validate error. This server was a former HyperV host but has been retired. So yes it did have HyperV role installed at one point. But I've removed that role. In fact it has 0 roles installed (other then Storage Services which is required and cannot be removed). |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
I just did another "Reset Project" on it. Its downloading everything fresh again and I'll see if that makes a difference: https://imgur.com/PNvB9sD If it doesn't then I don't know what else to say other then the code is buggy. Unless I need to switch to older versions of VB and BOINC. Edit: Surprise surprise....exact same results. Example WU: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99747354 I like that this one has "Error while computing" on two other hosts also.... I'm suspending this host and will switch back to Universe@home until someone can figure it out. |
Send message Joined: 7 Aug 14 Posts: 23 Credit: 9,962,408 RAC: 15,239 |
Is Boinc a Service Install on the other one that works ? |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
Is Boinc a Service Install on the other one that works ? Yup. I mean I can uninstall that too and reinstall not as a service but this is the only project I'm having a issue with so I wouldn't imagine that's it. Edit: Here is another work unit that I just did that's listed as invalid: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99773436 Now look at the other computer that also crunched it. Its also invalid, it also has about the same time on it. This can't be a issue with just my host... |
Send message Joined: 7 Aug 14 Posts: 23 Credit: 9,962,408 RAC: 15,239 |
No, if the other is working fine as a service install it won't be that. Atlas uses Vbox, have you run any other projects that use it ? |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
No, if the other is working fine as a service install it won't be that. No. And see my edit above. Also just some testing to make sure its not my host. I downloaded some yafu project tasks, 24 CPU WU's (YAFU for small composites 134.05). It crunched using all 24 threads, finished, and validated successfully. I know its apples to oranges because of VB but there is nothing fundamentally wrong with my machine itself. |
Send message Joined: 7 Aug 14 Posts: 23 Credit: 9,962,408 RAC: 15,239 |
Edit: Here is another work unit that I just did that's listed as invalid: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=99773436 Now look at the other computer that also crunched it. Its also invalid, it also has about the same time on it. This can't be a issue with just my host... That host doesn't have a normal valid Atlas task in its history. It does have some that were invalid due to things like: 2018-07-30 23:02:02 (6056): Guest Log: mv: cannot stat `metadata-*.xml': No such file or directory 2018-07-30 23:02:02 (6056): Guest Log: ERROR: Missing metadata.xml I haven't run Atlas for quite some time so am probably well behind on what it does now so am going to step away and let those more familiar with it try and figure out what is wrong. Good luck ! |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
Atlas uses Vbox, have you run any other projects that use it ? Since you said this I started thinking back and I did run Cosmology@Home which uses VBox. So I attached to that project again, it downloaded its vdi files and wrapper, and it grabbed a bunch of 24 CPU task. Gonna let those run over the weekend and see what happens. |
©2024 CERN