Message boards : Theory Application : All tasks finish with: finished with status code 1.
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,578,721
RAC: 131,333
Message 41515 - Posted: 10 Feb 2020, 21:15:05 UTC

All tasks seem to be successful at first sight but all of them report:
cranky-0.0.31: [INFO] Container 'runc' finished with status code 1

ppbar mb-inelastic 1800 - - pythia8 8.150 default
ppbar ue 1800 10 - pythia8 8.210 default-DL
ppbar jets 1960 64 - pythia6 6.427 pnocr
ppbar mb-inelastic 1960 - - pythia6 6.426 351
pp jets 7000 400 - pythia8 8.240 fischerPP1
pp w1j 7000 150 - pythia8 8.201 default-noFsr
pp mb-inelastic 13000 - - pythia8 8.230 default-noFsr
pp jets 7000 800 - pythia6 6.424 dwt
pp mb-inelastic 200 - - pythia6 6.428 373
ppbar jets 1960 100,1960,10,1960 - herwig7 7.1.1 softTune
... (and many more)
ID: 41515 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,578,721
RAC: 131,333
Message 41517 - Posted: 11 Feb 2020, 7:00:36 UTC

Meanwhile all tasks finish with:
cranky-0.0.31: [INFO] Container 'runc' finished with status code 1
ID: 41517 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 380
Credit: 238,712
RAC: 0
Message 41519 - Posted: 11 Feb 2020, 9:09:24 UTC - in response to Message 41517.  

Meanwhile all tasks finish with:
cranky-0.0.31: [INFO] Container 'runc' finished with status code 1


A failed execution of a Theory job does not fail the BOINC task. The output of the failed job is sent back to MCPlots and the job is not re-run. You can see the failed jobs in MCPlots.
ID: 41519 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1330
Credit: 8,761,505
RAC: 5,803
Message 41521 - Posted: 11 Feb 2020, 9:45:07 UTC
Last modified: 11 Feb 2020, 10:06:36 UTC

Fail ratio > 90%


ID: 41521 · Report as offensive     Reply Quote
Profile Ben Segal
Volunteer moderator
Project administrator

Send message
Joined: 1 Sep 04
Posts: 139
Credit: 2,579
RAC: 0
Message 41525 - Posted: 11 Feb 2020, 10:56:56 UTC - in response to Message 41521.  

There was a bug in the Pythia fix introduced yesterday - now fixed and the queue will be refilled.

Ben and Anton
ID: 41525 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,578,721
RAC: 131,333
Message 41526 - Posted: 11 Feb 2020, 11:08:39 UTC - in response to Message 41525.  

Yesterday herwig7 was also affected.
ID: 41526 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1330
Credit: 8,761,505
RAC: 5,803
Message 41527 - Posted: 11 Feb 2020, 12:56:20 UTC - in response to Message 41526.  

Yesterday herwig7 was also affected.
I suppose this was a coincidence during the pythia errors.
There are a few herwig7's without success so far, belonging to the failed tasks I mentioned in the other thread with energy 200 independent of the used generator.
pp mb-inelastic 200 - - herwig7 7.2.0 default
pp mb-inelastic 200 - - herwig7 7.2.0 softTune
pp mb-inelastic 200 - - herwig7 7.0.0 default
pp mb-inelastic 200 - - herwig7 7.0.0 UE-MMHT
pp mb-inelastic 200 - - herwig7 7.0.1 default
pp mb-inelastic 200 - - herwig7 7.0.1 UE-MMHT
pp mb-inelastic 200 - - herwig7 7.0.2 default
pp mb-inelastic 200 - - herwig7 7.0.2 UE-MMHT
pp mb-inelastic 200 - - herwig7 7.0.3 default
pp mb-inelastic 200 - - herwig7 7.0.3 UE-MMHT
pp mb-inelastic 200 - - herwig7 7.0.4 default
pp mb-inelastic 200 - - herwig7 7.0.4 UE-MMHT
pp mb-inelastic 200 - - herwig7 7.1.0 default
pp mb-inelastic 200 - - herwig7 7.1.0 softTune
pp mb-inelastic 200 - - herwig7 7.1.1 default
pp mb-inelastic 200 - - herwig7 7.1.1 softTune
pp mb-inelastic 200 - - herwig7 7.1.3 default
pp mb-inelastic 200 - - herwig7 7.1.3 softTune
pp mb-inelastic 200 - - herwig7 7.1.4 default
pp mb-inelastic 200 - - herwig7 7.1.4 softTune
pp mb-inelastic 200 - - herwig7 7.1.5 default
pp mb-inelastic 200 - - herwig7 7.1.5 softTune
pp mb-inelastic 200 - - herwig7 7.1.6 default
pp mb-inelastic 200 - - herwig7 7.1.6 softTune
ID: 41527 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 684
Credit: 44,404,965
RAC: 16,259
Message 41533 - Posted: 11 Feb 2020, 15:33:44 UTC

Those tasks that I still have (Pythia & Herwig) seem to run OK but Boinc ready to send queue is empty for Theory.
ID: 41533 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,606,373
RAC: 123,635
Message 47194 - Posted: 27 Aug 2022, 10:12:06 UTC

11:59:22 CEST +02:00 2022-08-27: cranky-0.0.32: [INFO] Running Container 'runc'.
container_linux.go:336: starting container process caused "process_linux.go:293: applying cgroup configuration for process caused \"mountpoint for cgroup not found\""
11:59:22 CEST +02:00 2022-08-27: cranky-0.0.32: [INFO] Container 'runc' finished with status code 1.
ID: 47194 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,606,373
RAC: 123,635
Message 47237 - Posted: 7 Sep 2022, 13:06:25 UTC

@Laurence wrote:
Suspend/Resume
The Suspend/Resume does not work out of the box. It needs a cgroup to be created for each slot and this requires a cgroup with permissions for the user boinc. This can be provided by adding a PreStart script for boinc-client systemd. Download two files with wget:

sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup -O /sbin/create-boinc-cgroup
sudo wget http://lhcathome.cern.ch/lhcathome/download/boinc-client.service -O /etc/systemd/system/boinc-client.service

Then run the following commands to pick up the changes:

sudo systemctl daemon-reload
sudo systemctl restart boinc-client

This will only suspend the application in memory. To suspend the application to disk so that it will survive the client exiting requires the container checkpointing feature. However, this is not currently available for Linux containers.

Please post to the message boards if there are any issues.

boinc.Service is running, but Theory task stopped with code 1

Loaded: loaded (/usr/lib/systemd/system/boinc-client.service; disabled; ve>
Active: active (running) since Wed 2022-09-07 14:50:19 CEST; 2min 0s ago
Docs: man:boinc(1)
Main PID: 321273 (boinc)
Tasks: 2 (limit: 60228)
Memory: 5.0M
CPU: 371ms
CGroup: /system.slice/boinc-client.service
└─321273 /usr/bin/boinc

Sep 07 14:50:20 RYZENMPSQ boinc[321273]: 07-Sep-2022 14:50:20 [---] Checking pr>
Sep 07 14:50:20 RYZENMPSQ boinc[321273]: 07-Sep-2022 14:50:20 [---] Using proxy

This are the instructions from boinc-client.service in /usr/lib/systemd/system/boinc-client.service:

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
Wants=vboxdrv.service
After=vboxdrv.service network-online.target

[Service]
Type=simple
ProtectHome=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true
#PrivateTmp=true #Block X11 idle detection

[Install]
WantedBy=multi-user.target

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10813499
ID: 47237 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,578,721
RAC: 131,333
Message 47238 - Posted: 7 Sep 2022, 15:04:08 UTC

Theory native requires cgroups v1.
Since recent Linux versions mostly switched to cgroups v2 Theory native will fail.

Although it's not recommended, the following kernel parameter can be used to switch to cgroups v1 but it may break services that require cgroups v2.
systemd.unified_cgroup_hierarchy=0
ID: 47238 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,606,373
RAC: 123,635
Message 47239 - Posted: 8 Sep 2022, 5:47:59 UTC - in response to Message 47238.  

This cgroup problem is a question for Laurence,
have no interest to install this parameter.
Thank you for the info.
ID: 47239 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2158
Credit: 162,606,373
RAC: 123,635
Message 47240 - Posted: 9 Sep 2022, 8:40:21 UTC - in response to Message 47239.  

boinc.service now for Atlas-Task:
boinc-client.service - Berkeley Open Infrastructure Network Computing Client
Loaded: loaded (/usr/lib/systemd/system/boinc-client.service; disabled; ve>
Active: active (running) since Wed 2022-09-07 18:54:26 CEST; 1 day 14h ago
Docs: man:boinc(1)
Main PID: 2758 (boinc)
Tasks: 43 (limit: 60228)
Memory: 2.9G
CPU: 9h 15min 53.707s
CGroup: /system.slice/boinc-client.service
├─ 2758 /usr/bin/boinc
├─210347 ../../projects/lhcathome.cern.ch_lhcathome/wrapper_26015_>
├─210349 /bin/bash run_atlas --nthreads 1
├─210350 /bin/bash run_atlas --nthreads 1
├─210353 awk "{ print strftime(\"[%Y-%m-%d %H:%M:%S]\"), \$0; fflu>
├─210851 "Apptainer runtime parent"
├─210864 /usr/bin/sh start_atlas.sh
├─210904 /usr/bin/time -o /var/lib/boinc/slots/0/MmhMDmDYhq1nsSi4a>
├─210905 /bin/bash ./runpilot2-wrapper.sh -q BOINC_MCORE -j manage>
├─215405 /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/pytho>
├─218469 /bin/bash -c "export PANDA_RESOURCE='BOINC_MCORE';export >
├─218472 /bin/bash -c "cd /var/lib/boinc/slots/0/PanDA_Pilot-55857>
├─222293 prmon --pid 218469 --filename memory_monitor_output.txt ->
├─223356 python /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasOf>
├─223595 /bin/sh ./runwrapper.EVNTtoHITS.sh
ID: 47240 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2450
Credit: 232,578,721
RAC: 131,333
Message 47874 - Posted: 20 Mar 2023, 7:05:24 UTC - in response to Message 47810.  

how do I apply the "systemd.unified_cgroup_hierarchy=0" workaround to my Ubuntu 22.04.2?

The ArchLinux manual has a rather complete page that explains how to pass kernel parameters:
https://wiki.archlinux.org/title/Kernel_parameters

As already mentioned:
Changing the default from cgroups v2 back to v1 may break other applications.
Hence, do it on your own risk.
ID: 47874 · Report as offensive     Reply Quote

Message boards : Theory Application : All tasks finish with: finished with status code 1.


©2024 CERN