Message boards : CMS Application : Does anyone else never restart your machine running CMS?
Message board moderation

To post messages, you must log in.

AuthorMessage
ace_quaker

Send message
Joined: 21 May 06
Posts: 3
Credit: 528,525
RAC: 7,460
Message 49792 - Posted: 20 Mar 2024, 5:41:36 UTC

When I restart my windows box, it hangs all the VM CMS jobs. 80+ core * hours or more lost each time. Its not like I just hit the power button either, I suspend BOINC, wait a bit, then restart the computer. The restart hangs with virtualbox having a few connections (yeah no shit, all vms show aborted). I then complete the restart ignoring the connections to virtualbox. After the restart, I log in and unsuspend BOINC. Now I get all tasks failing with a red error popup and the tasks show computation error.

I hate virtualbox but all my primary CPU projects are on hiatus so I figured Id give it a whirl. Not impressed. Could we just have native applications?
ID: 49792 · Report as offensive     Reply Quote
rob

Send message
Joined: 4 Mar 11
Posts: 22
Credit: 3,593,654
RAC: 770
Message 49793 - Posted: 20 Mar 2024, 7:24:29 UTC - in response to Message 49792.  

I had the same problem. There is a work-around that appears to work:
Suspend the project before shutting down BOINC, then I can safely turn the computer off. This forces Virtual Machine to first stop processing the task then saves the Virtual Machine so it can restart at a later date or time.

(And I agree with you about wanting native tasks rather than VM -based tasks - life would be a little easier).
ID: 49793 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,588,278
RAC: 130,236
Message 49794 - Posted: 20 Mar 2024, 7:33:55 UTC - in response to Message 49792.  

VirtualBox apps keep complex configuration problems away from volunteers who can't deal with them.
That's just one reason why CMS doesn't distribute a native app.
Another one is that they run on various platforms.

Native apps don't always solve the problems.
Your Linux computer returned 4 valid CMS tasks (vbox!) but did not return a single valid Theory native task for 3 days.
Instead it failed (so far) 283 of them.
All because you didn't install (and configure) a local CVMFS client.
And the reason is written in every log:
14:53:31 CDT -05:00 2024-03-17: cranky-0.1.4: [ERROR] Can't find 'cvmfs_config'.
14:53:31 CDT -05:00 2024-03-17: cranky-0.1.4: [ERROR] This usually means a local CVMFS client is not installed
14:53:31 CDT -05:00 2024-03-17: cranky-0.1.4: [ERROR] although it is a MUST to get data from online repositories.



As for your CMS errors on Windows.
Nearly all within the last few days have been caused by a temporary system outage at CERN.
But your total CPU time "waste" during this period sums up to less than 1 h.
Far away from the 80+ hrs you claim.
ID: 49794 · Report as offensive     Reply Quote
ace_quaker

Send message
Joined: 21 May 06
Posts: 3
Credit: 528,525
RAC: 7,460
Message 49798 - Posted: 20 Mar 2024, 21:20:04 UTC - in response to Message 49794.  

VirtualBox apps keep complex configuration problems away from volunteers who can't deal with them.


All because you didn't install (and configure) a local CVMFS client.



Perhaps the solution is not to require oddball filesystems and configurations from the users? To my knowledge no other project requires this junk. Or if they do, they handle it well like Rosetta so I don't notice it.


As for my errors on windows, I had about 8 tasks fail on lasts nights reboot at roughly 10 hours per task already run. Just like I said. 80 core * hours lost.
ID: 49798 · Report as offensive     Reply Quote
ace_quaker

Send message
Joined: 21 May 06
Posts: 3
Credit: 528,525
RAC: 7,460
Message 49799 - Posted: 20 Mar 2024, 21:26:01 UTC - in response to Message 49793.  

I had the same problem. There is a work-around that appears to work:
Suspend the project before shutting down BOINC, then I can safely turn the computer off. This forces Virtual Machine to first stop processing the task then saves the Virtual Machine so it can restart at a later date or time.

(And I agree with you about wanting native tasks rather than VM -based tasks - life would be a little easier).


Thanks Ill try this next time!
ID: 49799 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 88
Credit: 21,818,418
RAC: 22,496
Message 49853 - Posted: 28 Mar 2024, 19:43:44 UTC - in response to Message 49798.  

VirtualBox apps keep complex configuration problems away from volunteers who can't deal with them.


All because you didn't install (and configure) a local CVMFS client.



Perhaps the solution is not to require oddball filesystems and configurations from the users? To my knowledge no other project requires this junk. Or if they do, they handle it well like Rosetta so I don't notice it.


As for my errors on windows, I had about 8 tasks fail on lasts nights reboot at roughly 10 hours per task already run. Just like I said. 80 core * hours lost.


None of those things are *required*. They're a choice we the users make to run the native application instead of the Virtualbox one. While I have a number of errors in my history I also accept that the vast majority are my own fault, usually from playing around trying to make things work "better" but instead making them worse.

As for your errors on Windows, just because a work unit has clocked up any amount of time (wall-clock time) while active doesn't mean it's been using the CPU for that same time. They're tracked separately for a reason.
ID: 49853 · Report as offensive     Reply Quote

Message boards : CMS Application : Does anyone else never restart your machine running CMS?


©2024 CERN