Message boards : Number crunching : fubar host of the day
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36367 - Posted: 10 Aug 2018, 13:26:10 UTC
Last modified: 10 Aug 2018, 13:27:14 UTC

Here is something fun. I couldn't get a Cosmology VBox WU all weekend and into this week (0 WU available) so I crunched some other stuff including Universe@Home, SRBase, Yafu, YoYo, and some massive Citizen Science Grid WU (3+ days each). Every task was successful and validated. So I finished up all tasks, uninstalled VBox, uninstalled BOINC, rebooted, and reinstalled a fresh copy of the latest BOINC (7.12.1 + VBox 5.2.8). Added LHC again and got the exact same results with a 8 CPU WU...it finished in 600~ seconds as invalid. So I switched my preferences to 4 CPU max and it downloaded some of those. This time around it took what I consider a better time, a little over 3 hours. However the couple I did all gave a "error while computing" and actual errors in the stderr output. Here is one of them:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=204064087

Can anyone interpret this as anything?
ID: 36367 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,296,012
RAC: 93,975
Message 36368 - Posted: 10 Aug 2018, 13:29:39 UTC - in response to Message 36367.  

... and reinstalled a fresh copy of the latest BOINC (7.12.1 + VBox 5.2.8).

You better had taken 5.2.16.

I'm just testing it on my win10 1803 boxes, with BOINC 7.12.1 and VB 5.2.16 and it works fine.

But my machines don't crunch 8-Core-WUs instead 2x (3x) 4-Core-WUs.


Supporting BOINC, a great concept !
ID: 36368 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,296,012
RAC: 93,975
Message 36369 - Posted: 10 Aug 2018, 13:32:52 UTC

and did you ever take a ride through this checklist ?


Supporting BOINC, a great concept !
ID: 36369 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,244,158
RAC: 10,266
Message 36370 - Posted: 10 Aug 2018, 14:07:27 UTC - in response to Message 36367.  

The "Exit status 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED " in the stderr output pretty much says it all... not buggy software, host not configured properly for the job, classic PICNIC. Anyway, that host leads back to a user named Anonymous which may or may not be you which means your example task may or may not be yours. If you refuse to expose your hosts then there is no way anybody can help. I believe you're here just to yank chains.
ID: 36370 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 374
Credit: 20,097,576
RAC: 20,153
Message 36371 - Posted: 10 Aug 2018, 14:46:54 UTC - in response to Message 36370.  

The "Exit status 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED " comes from a situation where the disk usage of a slot directory where the task is running exceeds the limits set in init_data.xml <rsc_disk_bound>xxx</rsc_disk_bound>. Either the value was set too low for the task by the server when task was created or for some reason there were some extra files in the slot directory left behind from previous tasks or extra files were created during the task calculation which bloated the disk consumption over the <rsc_disk_bound> value. This is not related to any user setting of allowed disk usage for Boinc.
ID: 36371 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,244,158
RAC: 10,266
Message 36372 - Posted: 11 Aug 2018, 11:53:41 UTC - in response to Message 36371.  

Thank you. My mistake.
ID: 36372 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 374
Credit: 20,097,576
RAC: 20,153
Message 36373 - Posted: 11 Aug 2018, 12:52:56 UTC - in response to Message 36372.  

No worries, it is an easy enough mistake to make (most people do). I just happen to know because I've been around here long enough. Sixtrack has had a few of those occasions with this error and virtualbox tasks as well when it was introduced. If you google the error, you find links back to these message boards talking about them.

It is too bad that in this case the error makes solving the problem in hand even more difficult than normal.
ID: 36373 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36375 - Posted: 12 Aug 2018, 11:36:42 UTC - in response to Message 36373.  
Last modified: 12 Aug 2018, 11:43:29 UTC

and did you ever take a ride through this checklist ?


Yes...multiple times. Even tried running just 1 WU by itself. Same results.

If you refuse to expose your hosts then there is no way anybody can help. I believe you're here just to yank chains.

I did expose my hosts, for a week, and one person looked at things but didn't know what was wrong. Every reply you've given me has been negative and not helpful. I'm trying to figure out what is wrong and all you do is think I'm trying to cheat. Which doesn't make any sense. Please stop replying to my posts if you dont have anything helpful to say.

It is too bad that in this case the error makes solving the problem in hand even more difficult than normal.


So is there anything I can do other then abandon this project on this host? Can I manually put a setting in the app config the Tells it to use more disk space? The server has 100+ gigs free space so its definitely not a resource issue. I ended up doing 4 4-core WU and all ended with the same error.
ID: 36375 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,296,012
RAC: 93,975
Message 36376 - Posted: 12 Aug 2018, 11:41:49 UTC - in response to Message 36375.  

So is there anything I can do other then abandon this project on this host?

Shure !
Go to your preferences and check / adjust following settings:




Supporting BOINC, a great concept !
ID: 36376 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36377 - Posted: 12 Aug 2018, 11:46:09 UTC - in response to Message 36376.  

So is there anything I can do other then abandon this project on this host?

Shure !
Go to your preferences and check / adjust following settings:



Currently it is set for use no more than 100 gigs of disk space, leave at least 10 gigs free, and use no more than 90%. The drive is a 200Gb SSD which currently only has 50 gigs used. Surely an Atlas task can't use a 100 gigs of space, can it?
ID: 36377 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,296,012
RAC: 93,975
Message 36378 - Posted: 12 Aug 2018, 11:48:56 UTC - in response to Message 36377.  

Currently it is set for use no more than 100 gigs of disk space, leave at least 10 gigs free, and use no more than 90%. The drive is a 200Gb SSD which currently only has 50 gigs used. Surely an Atlas task can't use a 100 gigs of space, can it?

This should be okay.

Does the client show the same figures as the WEB ? Perhaps you set a local profil in the past ?


Supporting BOINC, a great concept !
ID: 36378 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 741
Credit: 27,557,869
RAC: 39,573
Message 36379 - Posted: 12 Aug 2018, 12:04:50 UTC

This is one from vseven from a earlier message.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=204064087
Is this your Computer?
ID: 36379 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36380 - Posted: 12 Aug 2018, 13:00:39 UTC - in response to Message 36379.  
Last modified: 12 Aug 2018, 13:07:47 UTC

Currently it is set for use no more than 100 gigs of disk space, leave at least 10 gigs free, and use no more than 90%. The drive is a 200Gb SSD which currently only has 50 gigs used. Surely an Atlas task can't use a 100 gigs of space, can it?

This should be okay.

Does the client show the same figures as the WEB ? Perhaps you set a local profile in the past ?


Yes, same settings on client. I unchecked all three anyway so it has no limits, removed the project completely, verified the project directory was gone, and readded the project. Its downloading the VDI and a 4 CPU task right now....I'll know in 5 - 6 hours if it worked (maybe more if it doesn't error out.)

This is one from vseven from a earlier message.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=204064087
Is this your Computer?


Yup, same computer.
ID: 36380 · Report as offensive     Reply Quote
Retnek
Avatar

Send message
Joined: 22 Jul 17
Posts: 1
Credit: 5,047,977
RAC: 0
Message 36381 - Posted: 12 Aug 2018, 17:12:07 UTC - in response to Message 36369.  

and did you ever take a ride through this checklist ?


Sorry folks, I can't follow you. The concept of BIONC is a platform for distributed computing. Usable for interested people without the need to take special care. Without the need to read several pages of quite specialized computer-stuff. Part of the design is to scale down the tasks in a way most PC are able to deal with flawlessly.

Imho the problem with LHC-tasks like ATLAS is pushing out overly demanding tasks from overambitious projects. No wonder this kind of setup is causing trouble. It makes no sense to insult people not able to compute those monster-jobs properly. It's the result of a project not respecting the limits of BOINC. Such elitism-computing-approach is bad for the overall reputation of BOINC.

If you want to participate from the large numbers of the "mass market" I suggest to civilize those tasks into a form the mass market is able to crunch. (see Cosmology dealing with VirtualBox, f.e.) Else call out an extra-project and ask for users willing to prepare the extras needed. And protect that extra-project with a password from the dumb folk.
ID: 36381 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,220,262
RAC: 248
Message 36382 - Posted: 12 Aug 2018, 18:37:01 UTC - in response to Message 36228.  

As noted on this thread on the ATLAS boards, the validation logic has been tightened so that credit is not allocated for these short unsuccessful tasks.

I'm not sure why such huge credits were being given to these tasks (credit logic has always been a mystery to me), in the past I don't think it was so large. Please post any examples of bad tasks getting credits which finished after 1 August when the validation was changed.

Hi David,

So this host https://lhcathome.cern.ch/lhcathome/results.php?hostid=10522402&offset=0&show_names=0&state=4&appid= has valid tasks (scroll past the first page or two) that were returned 2nd August. Approximately 30 minutes of run time for approximately 60 credits, ie 120 credits per hour, CPU time was a lot less.

Are these valid tasks receiving the correct credit ?

Another couple of tasks sent 4th August and returned 7th August got validated for 325+ credits...
https://lhcathome.cern.ch/lhcathome/result.php?resultid=203599212
https://lhcathome.cern.ch/lhcathome/result.php?resultid=203606471
So catching most but some still get through.

Shame he hasn't been able to get tasks to work properly.
ID: 36382 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,244,158
RAC: 10,266
Message 36383 - Posted: 12 Aug 2018, 18:43:52 UTC - in response to Message 36381.  

Such elitism-computing-approach is bad for the overall reputation of BOINC.

<sarcasm>
Elitist computing is a hoax invented by the Chinese, embodied by the abacus, for the purpose of bankrupting IBM and Microsoft. It has been expanded by leftist, virtue signaling globalists planted here by the UN cabal. Anybody who can crunch ATLAS tasks is obviously Illuminati, born in Kenya for sure. They all sell uranium to Putin and hookers from family owned pizza joints.
</sarcasm>
ID: 36383 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36387 - Posted: 13 Aug 2018, 0:21:13 UTC - in response to Message 36382.  
Last modified: 13 Aug 2018, 0:21:55 UTC

Shame he hasn't been able to get tasks to work properly.


So this test task, after telling BOINC there was no hard drive space limits at all by unchecking all three options, was a success (marked valid) and appears to have created a HITS file. It took just under 7 hours using 4 CPUs.

I don't know what the difference between telling BOINC it can use 100Gb and not to exceed 90% when I have 100+ Gb free and telling it there are no limits but it didn't give the disk space error. Going to run some more 4 CPU tasks then re-try some 8 CPU tasks and see what happens.
ID: 36387 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,244,158
RAC: 10,266
Message 36388 - Posted: 13 Aug 2018, 2:27:59 UTC - in response to Message 36387.  

I don't know what the difference between telling BOINC it can use 100Gb and not to exceed 90% when I have 100+ Gb free and telling it there are no limits but it didn't give the disk space error.
You have misinterpreted the disk space error the same way I misinterpreted it. See Harri Liljeroos's post upthread for a good explanation of what it means.

Going to run some more 4 CPU tasks then re-try some 8 CPU tasks and see what happens.
I think the experts here will agree that with respect to efficiency you're better off running 2 X 4-core tasks than 1 X 8-core tasks but it's your rig so experiment as you see fit. 2-core tasks are even more efficient than 4-core tasks, the trade-off is that 4 X 2-core tasks requires more memory than 1 X 8-core task.
ID: 36388 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 404
Credit: 87,296,012
RAC: 93,975
Message 36396 - Posted: 13 Aug 2018, 11:29:51 UTC - in response to Message 36387.  

I don't know what the difference between telling BOINC it can use 100Gb and not to exceed 90% when I have 100+ Gb free and telling it there are no limits but it didn't give the disk space error.

HARDDISC-spaces have grone so much in the last years and if the developers (or the System itself) uses a wrong value, this will give unwanted side-effects.

Example of former computer-days.

If you have a real integer-variable and the variable contains 32767. What happens, if you add 1 to this variable ?


    * 32768
    * Error
    *-32767



The correct answer in the past was -32767. A good system recognizes that the variable is too small and throws an error, but can you be shure ?

And something similar this way could make the difference you saw here




Supporting BOINC, a great concept !
ID: 36396 · Report as offensive     Reply Quote
vseven

Send message
Joined: 22 Jan 18
Posts: 32
Credit: 2,756,359
RAC: 0
Message 36397 - Posted: 13 Aug 2018, 13:08:45 UTC - in response to Message 36396.  

And something similar this way could make the difference you saw here


Sure. But why would it only affect Atlas tasks and no other projects? I'm pretty positive VBox would know the difference since its kept up to date.....would it be in the older VBox wrapper this project uses?

I completed another 4 CPU tasks without errors (around 9 hours) and with another 160Mb HITS file so my host seems to be happy. Still going to try a 8 CPU and 2 CPU task to make sure those both work.
ID: 36397 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : fubar host of the day


©2019 CERN