Message boards :
Number crunching :
most unpolite host of the day
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Check this one. Close to 600 bad tasks in 7 days. Each one of those blown tasks is a ~350MB download. Why does LHC allow this to go on when they can cut off such users? Maybe LHC has too much bandwidth and needs to burn some off? Meanwhile don't drain your ATLAS task cache because it will take 2 hours of constant hammering on the server to get more tasks. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,270,405 RAC: 5,338 |
Check this one. Close to 600 bad tasks in 7 days. Each one of those blown tasks is a ~350MB download. Something seems to be broken, because the server should limit the number of WUs for hosts like this to 1 per day Supporting BOINC, a great concept ! |
Send message Joined: 24 Oct 04 Posts: 1177 Credit: 54,887,670 RAC: 3,877 |
Yeah in the past if you got that many Invalids in a row you got that *24 hour* delay before getting new tasks. Looking at those Invalids and the computer info I would say he tried running too many Atlas tasks at the same time with not enough Ram (15.9 GB) And it also could be OC'd since I noticed that *K* ( i7-5820K ) As we know when people have hosts with 12 threads they like to run them all at the same time at 100% CPU and Memory (I always check the Task Manager when running all the threads to make sure it isn't maxed out memory) But it seems to be taking a break now https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10458080 |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
And that seems to be just the tip of the iceberg. Poking around the other day I ran across a user with a name like "gridcoin" who has ~400 hosts. Looking through his results for just 10 minutes I saw ~2,000 failed ATLAS tasks on ~5 hosts. I bet there's minimum 1,000 hosts doing that. Anybody else see now why dl speed is down to a trickle? We've heard from one admin who claims their throughput is the same... well, yeah.... but a significant portion of that seems to be tasks going to hosts that blow the task in 10 minutes and then get another task, rinse and repeat 100 times a day. |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
I hate to admit it but I have a bad host myself that I cannot figure out. I haven't said anything because it does report valid tasks but its like 60% valid to 40% invalid. Its a 24 thread machine with 64Gb of RAM running three 8 core WU's at a time. The weird thing is in the stderr output for the invalids it says everything was successful. Going to try and reduce the number of cores allowed to 22 so there are some spares for the machine itself and see if that makes a difference. My other two hosts have 1 or 2 invalids to hundreds of valid tasks which is what doesn't make sense to me. Same version of BOINC, VirtaulBox, OS, etc. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 23,290 |
As mentioned in another thread you may consider to make your logs visible for other volunteers. This would make it easier to give a qualified answer. To do so, navigate to https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project and check the box near "Should LHC@home show your computers on its web site?". |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,824,076 RAC: 56,247 |
I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox. OIC... gullible users auto added to difficult project by some misguided, uncaring group admin. What could possibly go wrong with that? He needs a PM from bronco. |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
As mentioned in another thread you may consider to make your logs visible for other volunteers. I know how to but I rather not expose all of my machines. But I can post examples. Marked valid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203431393 Marked invalid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203456578 I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox. OIC... gullible users auto added to difficult project by some misguided, uncaring group admin. What could possibly go wrong with that? He needs a PM from bronco. As someone that is on team Gridcoin I can tell you nothing is automatically added. Now if they use "Charity Engine" then yes project are added without the users knowledge. Its a nice scam the Charity Engine people have going. |
Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,233 RAC: 195 |
I know how to but I rather not expose all of my machines. But I can post examples. How long has Atlas been running valid tasks with less than 3 minutes CPU time for 293.50 credits ? I thought they took many hours for about twice that much credit ! |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 23,290 |
OK. Now I see what happened - I would call it cheating. Sorry, in this case there's no support from my side. |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,824,076 RAC: 56,247 |
Sorry, I read something over on the main boinc forums about issues with gridcoin and the use of account mangers in a non-standard manner. Looks like I bade a bad assumption |
Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,233 RAC: 195 |
So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each. Would an Admin like to confirm that they are valid results please ? I see the VMs have 8 CPUs assigned but the run time is barely enough to spin up the VM let alone do any valid work. |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
I have a second host with 40 threads that runs five 8 CPU WU's at a time, also with plenty of RAM and disk space. It has 380 valid and 3 invalid. WU take anyhere from 400 seconds up to one I see at 32000 seconds. I even ran a "reset project" on both hosts to make sure everything was correct and both re-downloaded the VDI file fresh. Same results on both. OK. So attaching to a project, letting it download its own files, and letting it run with zero modifications to anything is cheating? How so? |
Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,233 RAC: 195 |
So this one isn't yours then ? https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402 It only has 136 valid and 3 invalid and 77 error. That is also returning valid results with low run times although it has done at least one I would expect to be a real valid result... https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402 Its run time was 10,495.46 seconds and CPU time of 72,408.16 seconds for a credit of 295.61. At least its credit return is much smaller for the short runs. For the record I am not accusing you of cheating, I do not know either way if you are or not. My concern is that the project is happily marking results as valid with a high credit score when I don't believe they should be valid. I will wait for an Admin to respond, they may not appear until working hours though. It is a project oversight problem, I don't know if: 1) They don't care 2) They don't know how to check 3) They don't have time to check No idea what science your 'valid' results might be contributing, if an Admin comes back and says they are okay I won't argue any further but I don't see why everyone else is taking so many hours for so little credit. |
Send message Joined: 22 Jan 18 Posts: 32 Credit: 2,756,359 RAC: 0 |
I don't know either which is why I posted when I saw this thread. Lol....if I was somehow cheating why would I draw attention to myself? I was more hoping to figure out why I had so many invalid results so I could fix it....if it can be fixed. Like I said I've done nothing to modify anything and I've tried resetting the project which didn't change anything. Only difference is the one returning a lot of invalid is a older server (5 years old) which to me would mean it should be overall slower. The one with almost all valid results is newer (2 years old). |
Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,233 RAC: 195 |
I understand what you are saying but if you look at other users who don't hide their computers and see their Atlas results you will quickly realise that your valid results do not look normal. Don't bother looking at my computers, they are hidden but I haven't run any tasks for a while, I just came back to see what state the project was in and if it was worth running tasks again. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I know how to but I rather not expose all of my machines. But I can post examples. It's marked valid but if you use your webbrowser's search function (ctrl-f) to search the stderr output for "HITS" you will notice it finds nothing. That means 2 things: 1) it did not record an EVTtoHITS error (error 165) which is a sure indication the task failed to return any useful work 2) it did not record the usual confirmation of a successful EVTtoHITS conversion So the stderr output is inconclusive regarding HITS. Fortunately there is another way to learn whether or not a HITS file was generated/returned. Search for "pandaid" in the stderr and for the task above you'll find this: Starting ATLAS job. (PandaID=4008322627 taskID=14661314) You can use the PandaID code in that line, in this case 4008322627, to search the panda database for a definitive answer to whether or not your task succeeded. Combine the code with the base URL to get the full URL like this: https://bigpanda.cern.ch/job?pandaid=4008322627 The first time you do it you'll be asked to register a username and password. It's free and easy. Then the above URL will take you to the panda record for your result and you will notice it actually failed despite LHC@home saying it validated. Marked invalid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203456578That one has no mention of HITS nor does it have the "Starting ATLAS job.(PandaID=" line so naturally it failed and was appropriately marked invalid. As someone that is on team Gridcoin I can tell you nothing is automatically added. Now if they use "Charity Engine" then yes project are added without the users knowledge. Its a nice scam the Charity Engine people have going.Yes, I saw warnings about it on the BOINC forums. Seems people take great pride in being deplorable these days. Whatever. Now that I know your userID I'll plug it into my script and see if you have more fubar hosts. Not accusing you of cheating. If you are then I think maybe I know how you're doing it and I really couldn't care less. The credits are meaningless. I care only about the science being done and how much bandwidth fubar hosts might be wasting. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Lol....if I was somehow cheating why would I draw attention to myself?Not accusing you of cheating, just answering the question you asked. Some cheaters need to do more than just cheat. Their sick little egos drive them to make sure everybody else knows they are cheating. It makes them feel powerful and in control when they know that everybody knows they're cheating and cannot be stopped. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
I care only about the science being done and how much bandwidth fubar hosts might be wasting. Actually I also care very much that users come here, volunteer their computers and pay for the electricity thinking that the money they're spending is accomplishing something worthwhile and trusting that LHC@home will do what other projects do and tell them when they are wasting their time and money. Instead they smile and lie and say "Valid" when nothing useful has been done at all. That's a breach of trust, plain and simple. Like I said in another thread, no wonder nobody trusts scientists anymore. |
©2024 CERN