Message boards : Number crunching : fubar host of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,422,576
RAC: 95,975
Message 36426 - Posted: 14 Aug 2018, 20:30:19 UTC - in response to Message 36421.  

Also these lines look rather odd:
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:29:40 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:30:53 (912): Guest Log: copcopied the webapp to /var/www
2018-08-13 13:30:53 (912): Guest Log: ied the webapp to /var/www
2018-08-13 13:31:04 (912): Guest Log: TThhisi svm v md odeose sn otn onte ende etdo  steot uspe thutptp  hptrtopx yp
2018-08-13 13:31:04 (912): Guest Log: xy
2018-08-13 13:31:04 (912): Guest Log: AATHTEHNEAN_AP_RPORCO_CN_UNMUBMEBRE=R8
2018-08-13 13:31:04 (912): Guest Log: 8
2018-08-13 13:31:04 (912): Guest Log: SStarttianrgt iAnTgL AAS TjLoAbS.  j(oPba.nd a(IPDa=n4d0a2I4D64=64804294 t6a4s6k8I4D9=1 4t8a6s7k2I7D3=)1


I have some theories as to why those lines are garbled but you won't like them :)

This is still a problem, that we have seen sometimes by Alpha- or Beta-Tests, but never could identify or even solve it. But shure it is not a cheating script


Supporting BOINC, a great concept !
ID: 36426 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,252,104
RAC: 9,833
Message 36427 - Posted: 14 Aug 2018, 21:58:38 UTC - in response to Message 36426.  

OK, I am convinced, no cheating going on. Most likely it's as vseven said, the "suspend if CPU usage is over xx %" was too low.
ID: 36427 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 753
Credit: 6,033,177
RAC: 1,095
Message 36433 - Posted: 15 Aug 2018, 10:22:49 UTC - in response to Message 36426.  

Also these lines look rather odd:
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:29:40 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:30:53 (912): Guest Log: copcopied the webapp to /var/www
2018-08-13 13:30:53 (912): Guest Log: ied the webapp to /var/www
2018-08-13 13:31:04 (912): Guest Log: TThhisi svm v md odeose sn otn onte ende etdo  steot uspe thutptp  hptrtopx yp
2018-08-13 13:31:04 (912): Guest Log: xy
2018-08-13 13:31:04 (912): Guest Log: AATHTEHNEAN_AP_RPORCO_CN_UNMUBMEBRE=R8
2018-08-13 13:31:04 (912): Guest Log: 8
2018-08-13 13:31:04 (912): Guest Log: SStarttianrgt iAnTgL AAS TjLoAbS.  j(oPba.nd a(IPDa=n4d0a2I4D64=64804294 t6a4s6k8I4D9=1 4t8a6s7k2I7D3=)1


I have some theories as to why those lines are garbled but you won't like them :)

This is still a problem, that we have seen sometimes by Alpha- or Beta-Tests, but never could identify or even solve it. But shure it is not a cheating script

Nothing to worry about. It's a minor problem of vboxwrapper.
The wrapper tries to read frequently Guest logs from vbox.log out of the Logs folder in the slot directory.
If new log lines are found they are added to stderr.txt -> the result file.
The lines are rarely garbled and I never tried to discover under which circumstances this is happening.
ID: 36433 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,252,104
RAC: 9,833
Message 37055 - Posted: 18 Oct 2018, 11:38:03 UTC

10567661, 404 of 404 tasks failed, owner Anonymous, 100% failure rate

10567450, 1237 of 1237 tasks failed in past 4 days, owner George Bradshaw, team Gridcoin, 100% failure rate

10567464, 1360 of 1360 tasks failed in past 4 days, owner Anonymous, 100% failure rate

10563653, 212 of 212 tasks failed in past 4 days, owner Anonymous, 100% failure rate
ID: 37055 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 744
Credit: 27,613,099
RAC: 40,181
Message 37056 - Posted: 18 Oct 2018, 12:04:33 UTC - in response to Message 37055.  

This are typical shortrunner with:
VBoxManage.exe: error: VT-x is disabled in the BIOS for all CPU modes (VERR_VMX_MSR_ALL_VMX_DISABLED)
VT-X or AMD-V is not enabled, but...
other Computer finish them successful!!
ID: 37056 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,252,104
RAC: 9,833
Message 37062 - Posted: 19 Oct 2018, 12:54:40 UTC - in response to Message 37056.  

but... other Computer finish them successful!!

You missed the point. It's not about bad tasks (tasks that cannot be computed successfully). It's about the project wasting resources sending tasks to misconfigured hosts that never return a successful result. Perhaps it's the reason for all these repeated "infrastructure failures", "bottlenecks" and stuck zombie tasks we keep hearing about.
ID: 37062 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 852
Credit: 37,628,229
RAC: 25,753
Message 37063 - Posted: 19 Oct 2018, 14:10:51 UTC - in response to Message 36426.  
Last modified: 19 Oct 2018, 14:25:49 UTC

Also these lines look rather odd:
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: CCooppyyiinngg  iinnppuut tf iflielse si nitnot oR uRnuAntAltalsa.s.
2018-08-13 13:29:29 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:29:40 (912): Guest Log: Copied input files into RunAtlas.
2018-08-13 13:30:53 (912): Guest Log: copcopied the webapp to /var/www
2018-08-13 13:30:53 (912): Guest Log: ied the webapp to /var/www
2018-08-13 13:31:04 (912): Guest Log: TThhisi svm v md odeose sn otn onte ende etdo  steot uspe thutptp  hptrtopx yp
2018-08-13 13:31:04 (912): Guest Log: xy
2018-08-13 13:31:04 (912): Guest Log: AATHTEHNEAN_AP_RPORCO_CN_UNMUBMEBRE=R8
2018-08-13 13:31:04 (912): Guest Log: 8
2018-08-13 13:31:04 (912): Guest Log: SStarttianrgt iAnTgL AAS TjLoAbS.  j(oPba.nd a(IPDa=n4d0a2I4D64=64804294 t6a4s6k8I4D9=1 4t8a6s7k2I7D3=)1


I have some theories as to why those lines are garbled but you won't like them :)

This is still a problem, that we have seen sometimes by Alpha- or Beta-Tests, but never could identify or even solve it. But sure it is not a cheating script


You are correct and it was caused by a too slow internet connection with the server as I proved a few hundred times and yes that was those Alpha-Atlas tests you and I did. (I mentioned it here some time ago on one of the many threads) and you can see it doesn't happen as much here because of a couple code changes which can be seen if you watch the VM Console when they start running a new task.

(here is a copy of one from the early days before they went on to Beta)

Guest Log: CCooppyyiinngg iinnppuutt ffiilleess iinnttoo RRuunnAAttllaass..
2017-05-29 19:13:17 (6772): Guest Log: CCooppyyiinngg iinnppuutt ffiilleess iinnttoo RRuunnAAttllaass..
2017-05-29 19:13:17 (6772): Guest Log: Copied input files into RunAtlas.
2017-05-29 19:13:17 (6772): Guest Log: Copied input files into RunAtlas.
2017-05-29 19:14:27 (6772): Guest Log: ccooppiieedd tthhee wweebbaapppp ttoo //vvaarr//wwwwww
2017-05-29 19:14:27 (6772): Guest Log: ccooppiieedd tthhee wweebbaapppp ttoo //vvaarr//wwwwww
2017-05-29 19:14:27 (6772): Guest Log: TThhiiss vvmm ddooeess nnoott nneeeedd ttoo sseettuupp hhttttpp pprrooxxyy
2017-05-29 19:14:27 (6772): Guest Log: TThhiiss vvmm ddooeess nnoott nneeeedd ttoo sseettuupp hhttttpp pprrooxxyy
2017-05-29 19:14:27 (6772): Guest Log: AATTHHEENNAA__PPRROOCC__NNUUMMBBEERR==66
2017-05-29 19:14:27 (6772): Guest Log: AATTHHEENNAA__PPRROOCC__NNUUMMBBEERR==66
2017-05-29 19:14:27 (6772): Guest Log: SSttaarrttiinngg AATTLLAASS jjoobb.. ((PPaannddaaIIDD==33339988553388000055 ttaasskkIIDD==1111339977773366))
2017-05-29 19:14:27 (6772): Guest Log: SSttaarrttiinngg AATTLLAASS jjoobb.. ((PPaannddaaIIDD==33339988553388000055 ttaasskkIIDD==1111339977773366))
2017-05-29 19:19:58 (6772): Capturing screenshot.
(we save all the old test stderr's for those Alpha tests)
ID: 37063 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1141
Credit: 56,177,922
RAC: 96,332
Message 37297 - Posted: 10 Nov 2018, 17:28:20 UTC

Just stumbled over this misconfigured host that wastes resources for more than a week.
The user is anonymous so he/she can't be contacted by normal volunteers.

Solving the problem would probably be easy as ATM it's just VT-x that has to be switched on.

If the owner doesn't take care, can the host be banned?


https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10558009
VBoxManage.exe: error: VT-x is disabled in the BIOS for all CPU modes (VERR_VMX_MSR_ALL_VMX_DISABLED)
ID: 37297 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,252,104
RAC: 9,833
Message 37300 - Posted: 10 Nov 2018, 19:09:04 UTC - in response to Message 37297.  

The host seems to be in the process of banning itself from VBox based tasks. It's appliication details page shows it's restricted to 1 ATLAS per day. It is still entitled to 192 LHCb and 35 Theory per day but those numbers will also decrease to 1 if it continues to fail LHCb and Theory tasks. And fail it most certainly will until VT-x is enabled.

It hasn't failed any Sixtrack lately so it's still entitled to 522 per day.

I was under the impression this "self banning" mechanism was either turned off or broken. Either I was wrong or they just fixed it or maybe turned it on recently.
ID: 37300 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,252,104
RAC: 9,833
Message 37315 - Posted: 11 Nov 2018, 23:43:36 UTC - in response to Message 37300.  

After numerous Sixtrack successes it's reverted to failing Theory tasks which has decreased it's "max theory tasks allowed per day" to 7 from 35. Soon it will reduce to 1 and some some after LHCb will also reduce to 1. At that point the host is effectively a zombie that no longer devours resources, it just sleeps until it gets Sixtrack tasks which it crunches OK.

If the host doesn't get detached it will exist happily as a Sixtrack only zombie for years in which case the task downloads wasted on getting it to that state might be considered a good investment of resources.
But if it detaches and reattaches then what?... it starts wasting ATLAS downloads again?

I've stumbled on numerous hosts that fail all VBox tasks because VT-x is disabled. It would make sense to limit new hosts to 1 VBox task per day until it returns a report that indicates it meets minimum requirements such as having VT-x enabled.
ID: 37315 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1141
Credit: 56,177,922
RAC: 96,332
Message 37750 - Posted: 14 Jan 2019, 8:37:38 UTC

It looks like this computer does nothing useful.
It only wastes resources.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10563767

Unfortunately the project server doesn't notice it as rouge host.
I wonder if it could be banned manually until the user changes the configuration.
ID: 37750 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,422,576
RAC: 95,975
Message 37751 - Posted: 14 Jan 2019, 8:40:08 UTC - in response to Message 37750.  

It looks like this computer does nothing useful.
It only wastes resources.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10563767

Unfortunately the project server doesn't notice it as rouge host.
I wonder if it could be banned manually until the user changes the configuration.

I took a quick look and the host has more than 30 successfull jobs and only 2 or 3 bad, so something seems to be wrong with your link


Supporting BOINC, a great concept !
ID: 37751 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1141
Credit: 56,177,922
RAC: 96,332
Message 37753 - Posted: 14 Jan 2019, 12:16:59 UTC - in response to Message 37751.  

It looks like this computer does nothing useful.
It only wastes resources.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10563767

Unfortunately the project server doesn't notice it as rouge host.
I wonder if it could be banned manually until the user changes the configuration.

I took a quick look and the host has more than 30 successfull jobs and only 2 or 3 bad, so something seems to be wrong with your link

You may look at the runtimes/CPU-times and inspect the logs.
This tells a different story.

The runtimes are far too short for ATLAS as well as for Theory.
None of the ATLAS WUs produced a HITS file.
All Theory jobs finished with an error:
Guest Log: [INFO] Job finished in slotx with 1.
ID: 37753 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,422,576
RAC: 95,975
Message 37754 - Posted: 14 Jan 2019, 12:20:48 UTC - in response to Message 37753.  

You may look at the runtimes/CPU-times and inspect the logs.
This tells a different story.

The runtimes are far too short for ATLAS as well as for Theory.
None of the ATLAS WUs produced a HITS file.
All Theory jobs finished with an error:
Guest Log: [INFO] Job finished in slotx with 1.

Then we are back to an old Theme: Instead of telling "success" the Server should give back a real status. The User doesn't have a chance to recognize that his results are nothing worth


Supporting BOINC, a great concept !
ID: 37754 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 238
Credit: 10,752,454
RAC: 4,206
Message 37790 - Posted: 19 Jan 2019, 10:57:49 UTC

Self-reporting my 10457508 host before anyone else does.
Lots of bother doing yesterday's Windows update with all three hosts taking several attempts to download, install, reboot, re-update but this particular host crashed on startup each time and rolled back to pre-update state. Some searching found a suggestion to reset the Bios to defaults. This got the update to work, although it still shows the Windows build as 1803 although the others have 1809 now so I'm not convinced it has actually taken the update. Running Update again shows none available and the host is useable again.

OBVIOUSLY I forgot to reallow virtualisation in Bios so all overnight tasks failed but fixed that just now so back crunching.
ID: 37790 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 744
Credit: 27,613,099
RAC: 40,181
Message 37792 - Posted: 19 Jan 2019, 15:53:52 UTC - in response to Message 37790.  

OBVIOUSLY I forgot to reallow virtualisation in Bios so all overnight tasks failed but fixed that just now so back crunching.

Hello Ray,
since the new Boinc-Server upgrade.
In the Details of the Computer you can see now if VT-X or AMD-V is enabled.
Virtualization Virtualbox (5.2.24) installed, CPU has hardware virtualization support and it is enabled.
ID: 37792 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 593
Credit: 3,623,054
RAC: 3,317
Message 37799 - Posted: 20 Jan 2019, 17:38:10 UTC - in response to Message 37792.  
Last modified: 20 Jan 2019, 17:54:48 UTC

On my main Linux box linux-e1r2 I have VirtualBox 6.0.2 installed but the LHC BOINC server says no virtualization since the latest kernel upgrade from SuSE. If I click VirtualBox it comes up with no error message.
Tullio
From past experience I know I have to reboot. I have installed VBox 6.0.2 after the kernel upgrade and its consequent reboot.
ID: 37799 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 744
Credit: 27,613,099
RAC: 40,181
Message 37800 - Posted: 20 Jan 2019, 19:18:00 UTC - in response to Message 37799.  

Hi Tullio,
the Details of your Computer say it is enabled, of course.
Had you before 5.2.22 or 5.2.24(newest 5.2.xx) running?
ID: 37800 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 593
Credit: 3,623,054
RAC: 3,317
Message 37808 - Posted: 23 Jan 2019, 4:20:09 UTC - in response to Message 37800.  

Hi Tullio,
the Details of your Computer say it is enabled, of course.
Had you before 5.2.22 or 5.2.24(newest 5.2.xx) running?

I always upgrade to the latest VBox version by choosing the last option for Linux. It is now running.
Tullio
ID: 37808 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 852
Credit: 37,628,229
RAC: 25,753
Message 38223 - Posted: 11 Mar 2019, 21:52:44 UTC

ID: 38223 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : fubar host of the day


©2019 CERN