Message boards :
Theory Application :
New native version v300.08
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,819,192 RAC: 107,232 |
Theory is back with Tasks. Thank you. Sunday! |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,348,959 RAC: 18,397 |
Tried on both Linux Mint 20.3 (Ubuntu 20.04) and 21.2 (Ubuntu 22.04). I guess I can't run Theory any more until sudo 1.9.10 gets added to the Ubuntu repository. I know I am most likely late to the party on this one, but I only just started again with Theory Native, and ran into the same problem as Aurum. I went and visited the SUDO website at https://www.sudo.ws/, and grabbed the latest package for my distribution. Installed the latest version and then re ran the command in Laurence's OP. Here is one my of validated workunits - https://lhcathome.cern.ch/lhcathome/result.php?resultid=408095114 Cheers |
Send message Joined: 15 Jun 08 Posts: 2418 Credit: 226,702,146 RAC: 130,768 |
Looks good. Since a major goal of that version is to make suspend/resume work via systemd you may want to test this. Select a currently running task in BOINC manager (or your preferred BOINC tool) and pause the task. Test this with a task that has already started the container (see stderr.txt). Then this should happen: 1. You should find a corresponding line in the task's stderr.txt 2. run the "systemd status ..." command shown in stderr.txt (press 'q' to exit the pager). The output should mention the scope as "frozen". A while later resume the task via the BOINC management tool. Check again stderr.txt and the scope status. Hint: Although it would be possible to manually freeze/thaw the scope via systemctl this should not be done because BOINC will not be notified. Hence, always use BOINC for this. |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,348,959 RAC: 18,397 |
Ok, will try and find some time on the weekend and give it a go. Cheers |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,348,959 RAC: 18,397 |
Good morning, As suggested, went an selected a running theory task and paused it in BOINC. Looked in the matching stderr.txt for task to get the command. Ran the command - systemctl status Theory_2743-2733248-9_1.scope This was the output - Unit Theory_2743-2733248-9_1.scope could not be found. Resumed task, task reset itself back to 0.00 percent and then finished as a computation error. Here is the task - https://lhcathome.cern.ch/lhcathome/result.php?resultid=408107778 Hopefully someone else has had success and that would mean my setup is partially correct, but the suspend/resume is not setup correctly. Cheers |
Send message Joined: 15 Jun 08 Posts: 2418 Credit: 226,702,146 RAC: 130,768 |
The log entries should look a bit like these: 06:14:37 CET +01:00 2024-03-23: cranky: [INFO] Starting runc container. 06:14:38 CET +01:00 2024-03-23: cranky: [INFO] To get some details on systemd level run 06:14:38 CET +01:00 2024-03-23: cranky: [INFO] systemctl status Theory_2743-2785673-11_0.scope 06:14:38 CET +01:00 2024-03-23: cranky: [INFO] mcplots runspec: boinc pp jets 7000 80,-,1060 - herwig++ 2.7.1 UE-EE-5 100000 11 06:14:38 CET +01:00 2024-03-23: cranky: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_ 06:21:28 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope 06:22:27 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope 06:32:49 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope 06:32:58 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope 06:33:24 CET +01:00 2024-03-23: cranky: [INFO] Pausing systemd unit Theory_2743-2785673-11_0.scope 06:34:03 CET +01:00 2024-03-23: cranky: [INFO] Resuming systemd unit Theory_2743-2785673-11_0.scope 07:42:57 CET +01:00 2024-03-23: cranky: [INFO] Container Theory_2743-2785673-11_0 finished with status code 0. 07:42:57 CET +01:00 2024-03-23: cranky: [INFO] Preparing output. 07:42:58 (102851): cranky exited; CPU time 5042.031816 07:42:58 (102851): called boinc_finish(0) Yours look weird: 06:16:59 AWST +08:00 2024-03-23: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 260 - pythia6 6.428 ambt1 100000 9 06:16:59 AWST +08:00 2024-03-23: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_ 07:39:34 (590135): wrapper (7.15.26016): starting 07:39:34 (590135): wrapper (7.15.26016): starting . . . time="2024-03-23T07:39:38+08:00" level=error msg="container with id exists: Theory_2743-2733248-9_1" It looks like the task stared from scratch (for an unknown reason). It finally failed because runc didn't remove the container id from the 1st attempt. Which systemctl version do you use (must be at least v246)? Please post the output of "systemctl --version" plus the status output of a currently running Theory task. You get the latter via a command like this: systemctl --no-pager status Theory_2743-2733248-9_1.scope |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,348,959 RAC: 18,397 |
Evening, Thank you for the extra steps to look at and also an output of what it is supposed to look like. I have to step out for the evening, but I can post this bit of info on my system, using the following version of systemd. systemd 249 (249.11-0ubuntu3.12) Regards |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,348,959 RAC: 18,397 |
It's all working now, but not to sure why, but maybe a reboot did something lol. But here is a work unit from LHC-Dev (( had it running a few of the theory tasks over there) https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310417 I am getting the pause and resume now, with no errors. Also this is the output in the stderr.txt in one of my theory tasks from here (not dev) which I tried it as well and you can see it is working as well. 05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Starting runc container. 05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] To get some details on systemd level run 05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] systemctl status Theory_2743-2722097-13_1.scope 05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 160 - pythia8 8.308 tune-A2 100000 13 05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_ 07:40:32 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Pausing systemd unit Theory_2743-2722097-13_1.scope 07:43:01 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Resuming systemd unit Theory_2743-2722097-13_1.scope Regards |
©2024 CERN