Message boards : Theory Application : Checkpointing
Message board moderation

To post messages, you must log in.

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 108
Credit: 37,970,761
RAC: 0
Message 43150 - Posted: 31 Jul 2020, 13:03:48 UTC

Does Theory still not have checkpointing for Linux???

Last modified: 25 Mar 2019
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

To suspend the application to disk so that it will survive the client exiting requires the container checkpointing feature.
However, this is not currently available for Linux containers.
ID: 43150 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 108
Credit: 37,970,761
RAC: 0
Message 43151 - Posted: 31 Jul 2020, 13:06:56 UTC

With the so-called "native" Theory project is it still necessary to go through this rigmarole to even be able to suspend while still running BOINC client???

Last modified: 25 Mar 2019
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971

Suspend/Resume
The Suspend/Resume does not work out of the box. It needs a cgroup to be created for each slot and this requires a cgroup with permissions for the user boinc. This can be provided by adding a PreStart script for boinc-client systemd. Download two files with wget:
sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup -O /sbin/create-boinc-cgroup
sudo wget http://lhcathome.cern.ch/lhcathome/download/boinc-client.service -O /etc/systemd/system/boinc-client.service
Then run the following commands to pick up the changes:
sudo systemctl daemon-reload
sudo systemctl restart boinc-client
This will only suspend the application in memory.
ID: 43151 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1825
Credit: 123,765,773
RAC: 86,143
Message 43152 - Posted: 31 Jul 2020, 13:39:44 UTC - in response to Message 43151.  

Running Theory (or ATLAS) native still requires lots of expert knownlege, additional settings and more babysitting.
They pay back more efficient tasks and less total RAM requirements, especially if many task run concurrently on a computer with many cores.

Volunteers who don't want to spend that additional work should run the vbox apps.
ID: 43152 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 108
Credit: 37,970,761
RAC: 0
Message 43153 - Posted: 31 Jul 2020, 18:08:26 UTC - in response to Message 43152.  

Running Theory (or ATLAS) native still requires lots of expert knownlege, additional settings and more babysitting.
They pay back more efficient tasks and less total RAM requirements, especially if many task run concurrently on a computer with many cores.

Volunteers who don't want to spend that additional work should run the vbox apps.
Not an answer to either of my questions.
ID: 43153 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1046
Credit: 6,603,873
RAC: 275
Message 43161 - Posted: 1 Aug 2020, 15:04:52 UTC - in response to Message 43150.  

Does Theory still not have checkpointing for Linux???

With the so-called "native" Theory project is it still necessary to go through this rigmarole to even be able to suspend while still running BOINC client???

2 Yesses.

I'm not a Linux expert, but the only way I discovered to save a running native task in between, is to create on your machine (Linux or Windows) your own Linux VM, install BOINC on it and take all the needed steps to be able to run native tasks.
With BOINC and tasks running you may take snapshots of that VM and restore from the last snapshot when needed.
Another way would be when you have to shutdown the host:
Suspend the tasks keeping them in memory. Keep BOINC running and close the VM saving the state to disk.
ID: 43161 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 576
Credit: 18,018,746
RAC: 22,820
Message 43162 - Posted: 1 Aug 2020, 17:21:40 UTC - in response to Message 43151.  
Last modified: 1 Aug 2020, 17:23:08 UTC

With the so-called "native" Theory project is it still necessary to go through this rigmarole to even be able to suspend while still running BOINC client???

I just tried it on Ubuntu 18.04.4 and BOINC 7.16.6. After allowing a native Theory to run for 24 minutes, I suspended it for one minute and then resumed it.
It started up again with no problem. It is possible that longer-term suspensions might have problems though. I have not checked that.
ID: 43162 · Report as offensive     Reply Quote

Message boards : Theory Application : Checkpointing


©2021 CERN