Message boards : Theory Application : New native version v300.08
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,977,413
RAC: 38,115
Message 49519 - Posted: 11 Feb 2024, 12:41:26 UTC

Theory is back with Tasks.
Thank you. Sunday!
ID: 49519 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 49811 - Posted: 22 Mar 2024, 12:31:28 UTC - in response to Message 49136.  

Tried on both Linux Mint 20.3 (Ubuntu 20.04) and 21.2 (Ubuntu 22.04). I guess I can't run Theory any more until sudo 1.9.10 gets added to the Ubuntu repository.
Found Sudo-Version 1.9.9.
This sudo version is lower than 1.9.10.
It does not support regular expressions.
Hence, sudoers will not be modified.
Error running /tmp/prepare_theory_native_environment


I know I am most likely late to the party on this one, but I only just started again with Theory Native, and ran into the same problem as Aurum.

I went and visited the SUDO website at https://www.sudo.ws/, and grabbed the latest package for my distribution.

Installed the latest version and then re ran the command in Laurence's OP.

Here is one my of validated workunits -

https://lhcathome.cern.ch/lhcathome/result.php?resultid=408095114

Cheers
ID: 49811 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,266,423
RAC: 122,503
Message 49812 - Posted: 22 Mar 2024, 13:24:19 UTC - in response to Message 49811.  

Looks good.


Since a major goal of that version is to make suspend/resume work via systemd you may want to test this.

Select a currently running task in BOINC manager (or your preferred BOINC tool) and pause the task.
Test this with a task that has already started the container (see stderr.txt).
Then this should happen:

1.
You should find a corresponding line in the task's stderr.txt

2.
run the "systemd status ..." command shown in stderr.txt (press 'q' to exit the pager).
The output should mention the scope as "frozen".


A while later resume the task via the BOINC management tool.
Check again stderr.txt and the scope status.


Hint:
Although it would be possible to manually freeze/thaw the scope via systemctl this should not be done because BOINC will not be notified.
Hence, always use BOINC for this.
ID: 49812 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 49813 - Posted: 22 Mar 2024, 13:57:56 UTC - in response to Message 49812.  

Ok, will try and find some time on the weekend and give it a go.

Cheers
ID: 49813 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 49819 - Posted: 23 Mar 2024, 1:05:11 UTC - in response to Message 49813.  

Good morning,

As suggested, went an selected a running theory task and paused it in BOINC.

Looked in the matching stderr.txt for task to get the command.

Ran the command - systemctl status Theory_2743-2733248-9_1.scope

This was the output - Unit Theory_2743-2733248-9_1.scope could not be found.

Resumed task, task reset itself back to 0.00 percent and then finished as a computation error.

Here is the task - https://lhcathome.cern.ch/lhcathome/result.php?resultid=408107778

Hopefully someone else has had success and that would mean my setup is partially correct, but the suspend/resume is not setup correctly.

Cheers
ID: 49819 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,266,423
RAC: 122,503
Message 49820 - Posted: 23 Mar 2024, 9:07:53 UTC - in response to Message 49819.  

The log entries should look a bit like these:
06:14:37 CET +01:00 2024-03-23: cranky: [INFO] Starting runc container.
06:14:38 CET +01:00 2024-03-23: cranky: [INFO] To get some details on systemd level run
06:14:38 CET +01:00 2024-03-23: cranky: [INFO] systemctl status Theory_2743-2785673-11_0.scope
06:14:38 CET +01:00 2024-03-23: cranky: [INFO] mcplots runspec: boinc pp jets 7000 80,-,1060 - herwig++ 2.7.1 UE-EE-5 100000 11
06:14:38 CET +01:00 2024-03-23: cranky: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
06:21:28 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope
06:22:27 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope
06:32:49 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope
06:32:58 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope
06:33:24 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope
06:34:03 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope
07:42:57 CET +01:00 2024-03-23: cranky: [INFO] Container Theory_2743-2785673-11_0 finished with status code 0.
07:42:57 CET +01:00 2024-03-23: cranky: [INFO] Preparing output.
07:42:58 (102851): cranky exited; CPU time 5042.031816
07:42:58 (102851): called boinc_finish(0)



Yours look weird:
06:16:59 AWST +08:00 2024-03-23: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 260 - pythia6 6.428 ambt1 100000 9
06:16:59 AWST +08:00 2024-03-23: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
07:39:34 (590135): wrapper (7.15.26016): starting
07:39:34 (590135): wrapper (7.15.26016): starting
.
.
.
time="2024-03-23T07:39:38+08:00" level=error msg="container with id exists: Theory_2743-2733248-9_1"

It looks like the task stared from scratch (for an unknown reason).
It finally failed because runc didn't remove the container id from the 1st attempt.


Which systemctl version do you use (must be at least v246)?
Please post the output of "systemctl --version" plus the status output of a currently running Theory task.
You get the latter via a command like this:
systemctl --no-pager status Theory_2743-2733248-9_1.scope
ID: 49820 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 49821 - Posted: 23 Mar 2024, 10:53:46 UTC - in response to Message 49820.  

Evening,

Thank you for the extra steps to look at and also an output of what it is supposed to look like.

I have to step out for the evening, but I can post this bit of info on my system, using the following version of systemd.

systemd 249 (249.11-0ubuntu3.12)

Regards
ID: 49821 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 49822 - Posted: 23 Mar 2024, 23:52:48 UTC - in response to Message 49821.  

It's all working now, but not to sure why, but maybe a reboot did something lol.

But here is a work unit from LHC-Dev (( had it running a few of the theory tasks over there)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310417

I am getting the pause and resume now, with no errors.

Also this is the output in the stderr.txt in one of my theory tasks from here (not dev) which I tried it as well and you can see it is working as well.

05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Starting runc container.
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] To get some details on systemd level run
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] systemctl status Theory_2743-2722097-13_1.scope
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 160 - pythia8 8.308 tune-A2 100000 13
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
07:40:32 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Pausing systemd unit Theory_2743-2722097-13_1.scope
07:43:01 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Resuming systemd unit Theory_2743-2722097-13_1.scope

Regards
ID: 49822 · Report as offensive     Reply Quote
M0CZY
New member

Send message
Joined: 27 Apr 24
Posts: 3
Credit: 23,261
RAC: 1,103
Message 50129 - Posted: 6 May 2024, 11:09:47 UTC

I'm struggling to get native Theory working on my Ubuntu 22.04.
I've manually upgraded my sudo to version: Sudo version 1.9.14p2
I've run the sudoer's file, and rebooted.
Here is my latest computation error. I don't know how to proceed from here.
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
11:58:44 (3508): wrapper (7.15.26016): starting
11:58:44 (3508): wrapper (7.15.26016): starting
11:58:44 (3508): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.1.4 ()
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Detected Theory App
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] This application must have permanent access to
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] online repositories via a local CVMFS service.
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] It supports suspend/resume if a couple of
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] requirements are fulfilled.
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Most important:
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - init process is systemd
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - cgroups v2 is enabled and 'freezer' is available
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - the user running this application is a member of the 'boinc' group
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - sudo is at least version 1.9.10
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - sudoer file provided by LHC@home is installed
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Checking local requirements.
11:58:44 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.14p2.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/alice.cern.ch... OK
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/cernvm-prod.cern.ch... OK
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/grid.cern.ch... OK
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/sft.cern.ch... OK
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] 2.11.3.0 http://s1ral-cvmfs.openhtc.io/cvmfs/alice.cern.ch DIRECT
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Found 'runc version spec: 1.0.2-dev' at '/cvmfs/grid.cern.ch/vc/containers/runc.new'.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Creating container filesystem.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm4
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Starting runc container.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] To get some details on systemd level run
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] systemctl status Theory_2743-2839118-133_2.scope
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] mcplots runspec: boinc pp z1j 13000 110 - pythia6 6.428 380 100000 133
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
time="2024-05-06T11:58:48+01:00" level=error msg="operation not permitted"
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Container Theory_2743-2839118-133_2 finished with status code 1.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Preparing output.
11:58:48 BST +01:00 2024-05-06: cranky-0.1.4: [ERROR] No output found.
11:58:49 (3508): cranky exited; CPU time 0.314435
11:58:49 (3508): app exit status: 0xce
11:58:49 (3508): called boinc_finish(195)

</stderr_txt>
]]>
ID: 50129 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,266,423
RAC: 122,503
Message 50131 - Posted: 6 May 2024, 11:40:59 UTC - in response to Message 50129.  

This error is from runc:
time="2024-05-06T11:58:48+01:00" level=error msg="operation not permitted"

The log states it is a version from grid.cern.ch:
Found 'runc version spec: 1.0.2-dev' at '/cvmfs/grid.cern.ch/vc/containers/runc.new'


Please try to get/install a more recent runc from your Linux vendor.

In addition please check if this option is set in your boinc-client.service file:
ProtectSystem=strict
If so, change "strict" to "full", preferably via an overlay file (see the systemd manual or many posts here).
ID: 50131 · Report as offensive     Reply Quote
M0CZY
New member

Send message
Joined: 27 Apr 24
Posts: 3
Credit: 23,261
RAC: 1,103
Message 50132 - Posted: 6 May 2024, 12:51:33 UTC - in response to Message 50131.  

I've installed runc version 1.1.7-0ubuntu1~22.04.2, and checked that in the boinc-client.service file it says "ProtectSystem=full", then I rebooted my computer. This is from the latest error Stderr output.
This is the line that seems to be where the problem lies:
time="2024-05-06T13:38:35+01:00" level=error msg="runc run failed: fchown fd 7: operation not permitted"
It seems to be a permissions thing that I'm unaware of?
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
13:38:33 (3775): wrapper (7.15.26016): starting
13:38:33 (3775): wrapper (7.15.26016): starting
13:38:33 (3775): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.1.4 ()
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Detected Theory App
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] This application must have permanent access to
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] online repositories via a local CVMFS service.
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] It supports suspend/resume if a couple of
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] requirements are fulfilled.
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Most important:
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - init process is systemd
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - cgroups v2 is enabled and 'freezer' is available
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - the user running this application is a member of the 'boinc' group
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - sudo is at least version 1.9.10
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] - sudoer file provided by LHC@home is installed
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Checking local requirements.
13:38:33 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.14p2.
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/alice.cern.ch... OK
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/cernvm-prod.cern.ch... OK
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/grid.cern.ch... OK
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Probing /cvmfs/sft.cern.ch... OK
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] 2.11.3.0 http://s1ral-cvmfs.openhtc.io/cvmfs/alice.cern.ch DIRECT
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Found a local runc version 1.1.7-0ubuntu1~22.04.2.
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Creating container filesystem.
13:38:34 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm4
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Starting runc container.
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] To get some details on systemd level run
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] systemctl status Theory_2743-2749343-140_0.scope
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 660 - herwig7 7.2.0 default 100000 140
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
time="2024-05-06T13:38:35+01:00" level=error msg="runc run failed: fchown fd 7: operation not permitted"
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Container Theory_2743-2749343-140_0 finished with status code 1.
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [INFO] Preparing output.
13:38:35 BST +01:00 2024-05-06: cranky-0.1.4: [ERROR] No output found.
13:38:35 (3775): cranky exited; CPU time 0.317553
13:38:35 (3775): app exit status: 0xce
13:38:35 (3775): called boinc_finish(195)

</stderr_txt>
]]>
ID: 50132 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,266,423
RAC: 122,503
Message 50135 - Posted: 6 May 2024, 14:30:43 UTC - in response to Message 50132.  

You have another computer attached to the project that runs Arch Linux [6.8.9-arch1-1|libc 2.39].
That one successfully runs Theory native even with the runc version from grid.cern.ch.

So what you can do is to compare the setup of both to find out what's different or you replace Ubuntu 22.04.4 with ArchLinux.
ID: 50135 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Theory Application : New native version v300.08


©2024 CERN