Message boards : Number crunching : fubar host of the day
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36085 - Posted: 27 Jul 2018, 14:00:59 UTC

Check this one. Close to 600 bad tasks in 7 days. Each one of those blown tasks is a ~350MB download.

Why does LHC allow this to go on when they can cut off such users? Maybe LHC has too much bandwidth and needs to burn some off? Meanwhile don't drain your ATLAS task cache because it will take 2 hours of constant hammering on the server to get more tasks.
ID: 36085 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 86,848,055
RAC: 94,975
Message 36174 - Posted: 1 Aug 2018, 14:47:21 UTC - in response to Message 36085.  

Check this one. Close to 600 bad tasks in 7 days. Each one of those blown tasks is a ~350MB download.

Why does LHC allow this to go on when they can cut off such users? Maybe LHC has too much bandwidth and needs to burn some off? Meanwhile don't drain your ATLAS task cache because it will take 2 hours of constant hammering on the server to get more tasks.

Something seems to be broken, because the server should limit the number of WUs for hosts like this to 1 per day


Supporting BOINC, a great concept !
ID: 36174 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 849
Credit: 37,486,675
RAC: 27,144
Message 36192 - Posted: 2 Aug 2018, 1:01:40 UTC

Yeah in the past if you got that many Invalids in a row you got that *24 hour* delay before getting new tasks.

Looking at those Invalids and the computer info I would say he tried running too many Atlas tasks at the same time with not enough Ram (15.9 GB)

And it also could be OC'd since I noticed that *K* ( i7-5820K )

As we know when people have hosts with 12 threads they like to run them all at the same time at 100% CPU and Memory

(I always check the Task Manager when running all the threads to make sure it isn't maxed out memory)

But it seems to be taking a break now

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10458080
ID: 36192 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36194 - Posted: 2 Aug 2018, 1:40:35 UTC

And that seems to be just the tip of the iceberg. Poking around the other day I ran across a user with a name like "gridcoin" who has ~400 hosts. Looking through his results for just 10 minutes I saw ~2,000 failed ATLAS tasks on ~5 hosts. I bet there's minimum 1,000 hosts doing that.

Anybody else see now why dl speed is down to a trickle? We've heard from one admin who claims their throughput is the same... well, yeah.... but a significant portion of that seems to be tasks going to hosts that blow the task in 10 minutes and then get another task, rinse and repeat 100 times a day.
ID: 36194 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36201 - Posted: 2 Aug 2018, 13:08:22 UTC

I hate to admit it but I have a bad host myself that I cannot figure out. I haven't said anything because it does report valid tasks but its like 60% valid to 40% invalid. Its a 24 thread machine with 64Gb of RAM running three 8 core WU's at a time. The weird thing is in the stderr output for the invalids it says everything was successful. Going to try and reduce the number of cores allowed to 22 so there are some spares for the machine itself and see if that makes a difference.

My other two hosts have 1 or 2 invalids to hundreds of valid tasks which is what doesn't make sense to me. Same version of BOINC, VirtaulBox, OS, etc.
ID: 36201 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,674,412
RAC: 104,837
Message 36203 - Posted: 2 Aug 2018, 13:35:54 UTC - in response to Message 36201.  

As mentioned in another thread you may consider to make your logs visible for other volunteers.
This would make it easier to give a qualified answer.

To do so, navigate to https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project and check the box near "Should LHC@home show your computers on its web site?".
ID: 36203 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,082,285
RAC: 527,258
Message 36205 - Posted: 2 Aug 2018, 15:57:19 UTC - in response to Message 36194.  

I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox.
ID: 36205 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36207 - Posted: 2 Aug 2018, 17:17:48 UTC - in response to Message 36205.  

I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox.

OIC... gullible users auto added to difficult project by some misguided, uncaring group admin. What could possibly go wrong with that? He needs a PM from bronco.
ID: 36207 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36208 - Posted: 2 Aug 2018, 17:45:52 UTC - in response to Message 36203.  
Last modified: 2 Aug 2018, 17:46:26 UTC

As mentioned in another thread you may consider to make your logs visible for other volunteers.
This would make it easier to give a qualified answer.

To do so, navigate to https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project and check the box near "Should LHC@home show your computers on its web site?".


I know how to but I rather not expose all of my machines. But I can post examples.

Marked valid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203431393
Marked invalid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203456578

I'm think for gridcoin it because the users are automatically added to this project without really knowing so they don't realize they need virtualbox.


OIC... gullible users auto added to difficult project by some misguided, uncaring group admin. What could possibly go wrong with that? He needs a PM from bronco.


As someone that is on team Gridcoin I can tell you nothing is automatically added. Now if they use "Charity Engine" then yes project are added without the users knowledge. Its a nice scam the Charity Engine people have going.
ID: 36208 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36209 - Posted: 2 Aug 2018, 18:31:22 UTC - in response to Message 36208.  

I know how to but I rather not expose all of my machines. But I can post examples.

Marked valid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203431393
Marked invalid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203456578

How long has Atlas been running valid tasks with less than 3 minutes CPU time for 293.50 credits ?

I thought they took many hours for about twice that much credit !
ID: 36209 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1133
Credit: 55,674,412
RAC: 104,837
Message 36210 - Posted: 2 Aug 2018, 18:39:58 UTC - in response to Message 36208.  

OK.
Now I see what happened - I would call it cheating.
Sorry, in this case there's no support from my side.
ID: 36210 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,082,285
RAC: 527,258
Message 36211 - Posted: 2 Aug 2018, 18:46:50 UTC - in response to Message 36208.  

Sorry, I read something over on the main boinc forums about issues with gridcoin and the use of account mangers in a non-standard manner.


Looks like I bade a bad assumption
ID: 36211 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36212 - Posted: 2 Aug 2018, 18:58:59 UTC - in response to Message 36211.  

So that host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522400&offset=0&show_names=0&state=4&appid= at the moment has 785 VALID tasks that used about 3 minutes of CPU time for a credit of about 300 each.

Would an Admin like to confirm that they are valid results please ?
I see the VMs have 8 CPUs assigned but the run time is barely enough to spin up the VM let alone do any valid work.
ID: 36212 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36214 - Posted: 2 Aug 2018, 19:52:58 UTC - in response to Message 36212.  
Last modified: 2 Aug 2018, 19:55:33 UTC

I have a second host with 40 threads that runs five 8 CPU WU's at a time, also with plenty of RAM and disk space. It has 380 valid and 3 invalid. WU take anyhere from 400 seconds up to one I see at 32000 seconds. I even ran a "reset project" on both hosts to make sure everything was correct and both re-downloaded the VDI file fresh. Same results on both.

OK.
Now I see what happened - I would call it cheating.
Sorry, in this case there's no support from my side.


So attaching to a project, letting it download its own files, and letting it run with zero modifications to anything is cheating? How so?
ID: 36214 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36215 - Posted: 2 Aug 2018, 20:15:15 UTC - in response to Message 36214.  

So this one isn't yours then ?
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402

It only has 136 valid and 3 invalid and 77 error.
That is also returning valid results with low run times although it has done at least one I would expect to be a real valid result...
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10522402
Its run time was 10,495.46 seconds and CPU time of 72,408.16 seconds for a credit of 295.61.
At least its credit return is much smaller for the short runs.

For the record I am not accusing you of cheating, I do not know either way if you are or not.
My concern is that the project is happily marking results as valid with a high credit score when I don't believe they should be valid.

I will wait for an Admin to respond, they may not appear until working hours though.

It is a project oversight problem, I don't know if:
1) They don't care
2) They don't know how to check
3) They don't have time to check

No idea what science your 'valid' results might be contributing, if an Admin comes back and says they are okay I won't argue any further but I don't see why everyone else is taking so many hours for so little credit.
ID: 36215 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36216 - Posted: 2 Aug 2018, 20:18:36 UTC - in response to Message 36215.  

I don't know either which is why I posted when I saw this thread. Lol....if I was somehow cheating why would I draw attention to myself? I was more hoping to figure out why I had so many invalid results so I could fix it....if it can be fixed. Like I said I've done nothing to modify anything and I've tried resetting the project which didn't change anything.

Only difference is the one returning a lot of invalid is a older server (5 years old) which to me would mean it should be overall slower. The one with almost all valid results is newer (2 years old).
ID: 36216 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 496
Message 36217 - Posted: 2 Aug 2018, 20:27:11 UTC - in response to Message 36216.  

I understand what you are saying but if you look at other users who don't hide their computers and see their Atlas results you will quickly realise that your valid results do not look normal.

Don't bother looking at my computers, they are hidden but I haven't run any tasks for a while, I just came back to see what state the project was in and if it was worth running tasks again.
ID: 36217 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36218 - Posted: 2 Aug 2018, 23:05:47 UTC - in response to Message 36208.  

I know how to but I rather not expose all of my machines. But I can post examples.

Marked valid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203431393

It's marked valid but if you use your webbrowser's search function (ctrl-f) to search the stderr output for "HITS" you will notice it finds nothing. That means 2 things:
1) it did not record an EVTtoHITS error (error 165) which is a sure indication the task failed to return any useful work
2) it did not record the usual confirmation of a successful EVTtoHITS conversion
So the stderr output is inconclusive regarding HITS. Fortunately there is another way to learn whether or not a HITS file was generated/returned. Search for "pandaid" in the stderr and for the task above you'll find this:
 Starting ATLAS job. (PandaID=4008322627 taskID=14661314)

You can use the PandaID code in that line, in this case 4008322627, to search the panda database for a definitive answer to whether or not your task succeeded. Combine the code with the base URL to get the full URL like this:
https://bigpanda.cern.ch/job?pandaid=4008322627
The first time you do it you'll be asked to register a username and password. It's free and easy. Then the above URL will take you to the panda record for your result and you will notice it actually failed despite LHC@home saying it validated.

Marked invalid: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203456578
That one has no mention of HITS nor does it have the "Starting ATLAS job.(PandaID=" line so naturally it failed and was appropriately marked invalid.

As someone that is on team Gridcoin I can tell you nothing is automatically added. Now if they use "Charity Engine" then yes project are added without the users knowledge. Its a nice scam the Charity Engine people have going.
Yes, I saw warnings about it on the BOINC forums. Seems people take great pride in being deplorable these days. Whatever. Now that I know your userID I'll plug it into my script and see if you have more fubar hosts. Not accusing you of cheating. If you are then I think maybe I know how you're doing it and I really couldn't care less. The credits are meaningless. I care only about the science being done and how much bandwidth fubar hosts might be wasting.
ID: 36218 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36219 - Posted: 2 Aug 2018, 23:16:28 UTC - in response to Message 36216.  

Lol....if I was somehow cheating why would I draw attention to myself?
Not accusing you of cheating, just answering the question you asked. Some cheaters need to do more than just cheat. Their sick little egos drive them to make sure everybody else knows they are cheating. It makes them feel powerful and in control when they know that everybody knows they're cheating and cannot be stopped.
ID: 36219 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,196,738
RAC: 10,577
Message 36220 - Posted: 3 Aug 2018, 1:27:08 UTC - in response to Message 36218.  

I care only about the science being done and how much bandwidth fubar hosts might be wasting.

Actually I also care very much that users come here, volunteer their computers and pay for the electricity thinking that the money they're spending is accomplishing something worthwhile and trusting that LHC@home will do what other projects do and tell them when they are wasting their time and money. Instead they smile and lie and say "Valid" when nothing useful has been done at all. That's a breach of trust, plain and simple. Like I said in another thread, no wonder nobody trusts scientists anymore.
ID: 36220 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : fubar host of the day


©2019 CERN