Message boards :
Theory Application :
All tasks finish with: finished with status code 1.
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,451,734 RAC: 66,740 |
All tasks seem to be successful at first sight but all of them report: cranky-0.0.31: [INFO] Container 'runc' finished with status code 1 ppbar mb-inelastic 1800 - - pythia8 8.150 default ppbar ue 1800 10 - pythia8 8.210 default-DL ppbar jets 1960 64 - pythia6 6.427 pnocr ppbar mb-inelastic 1960 - - pythia6 6.426 351 pp jets 7000 400 - pythia8 8.240 fischerPP1 pp w1j 7000 150 - pythia8 8.201 default-noFsr pp mb-inelastic 13000 - - pythia8 8.230 default-noFsr pp jets 7000 800 - pythia6 6.424 dwt pp mb-inelastic 200 - - pythia6 6.428 373 ppbar jets 1960 100,1960,10,1960 - herwig7 7.1.1 softTune ... (and many more) |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,451,734 RAC: 66,740 |
Meanwhile all tasks finish with: cranky-0.0.31: [INFO] Container 'runc' finished with status code 1 |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Meanwhile all tasks finish with: A failed execution of a Theory job does not fail the BOINC task. The output of the failed job is sent back to MCPlots and the job is not re-run. You can see the failed jobs in MCPlots. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,539,339 RAC: 5,065 |
Fail ratio > 90% |
Send message Joined: 1 Sep 04 Posts: 140 Credit: 2,579 RAC: 0 |
There was a bug in the Pythia fix introduced yesterday - now fixed and the queue will be refilled. Ben and Anton |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,451,734 RAC: 66,740 |
Yesterday herwig7 was also affected. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,539,339 RAC: 5,065 |
Yesterday herwig7 was also affected.I suppose this was a coincidence during the pythia errors. There are a few herwig7's without success so far, belonging to the failed tasks I mentioned in the other thread with energy 200 independent of the used generator. pp mb-inelastic 200 - - herwig7 7.2.0 default pp mb-inelastic 200 - - herwig7 7.2.0 softTune pp mb-inelastic 200 - - herwig7 7.0.0 default pp mb-inelastic 200 - - herwig7 7.0.0 UE-MMHT pp mb-inelastic 200 - - herwig7 7.0.1 default pp mb-inelastic 200 - - herwig7 7.0.1 UE-MMHT pp mb-inelastic 200 - - herwig7 7.0.2 default pp mb-inelastic 200 - - herwig7 7.0.2 UE-MMHT pp mb-inelastic 200 - - herwig7 7.0.3 default pp mb-inelastic 200 - - herwig7 7.0.3 UE-MMHT pp mb-inelastic 200 - - herwig7 7.0.4 default pp mb-inelastic 200 - - herwig7 7.0.4 UE-MMHT pp mb-inelastic 200 - - herwig7 7.1.0 default pp mb-inelastic 200 - - herwig7 7.1.0 softTune pp mb-inelastic 200 - - herwig7 7.1.1 default pp mb-inelastic 200 - - herwig7 7.1.1 softTune pp mb-inelastic 200 - - herwig7 7.1.3 default pp mb-inelastic 200 - - herwig7 7.1.3 softTune pp mb-inelastic 200 - - herwig7 7.1.4 default pp mb-inelastic 200 - - herwig7 7.1.4 softTune pp mb-inelastic 200 - - herwig7 7.1.5 default pp mb-inelastic 200 - - herwig7 7.1.5 softTune pp mb-inelastic 200 - - herwig7 7.1.6 default pp mb-inelastic 200 - - herwig7 7.1.6 softTune |
Send message Joined: 28 Sep 04 Posts: 736 Credit: 49,884,924 RAC: 35,291 |
Those tasks that I still have (Pythia & Herwig) seem to run OK but Boinc ready to send queue is empty for Theory. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
11:59:22 CEST +02:00 2022-08-27: cranky-0.0.32: [INFO] Running Container 'runc'. container_linux.go:336: starting container process caused "process_linux.go:293: applying cgroup configuration for process caused \"mountpoint for cgroup not found\"" 11:59:22 CEST +02:00 2022-08-27: cranky-0.0.32: [INFO] Container 'runc' finished with status code 1. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
@Laurence wrote: Suspend/Resume The Suspend/Resume does not work out of the box. It needs a cgroup to be created for each slot and this requires a cgroup with permissions for the user boinc. This can be provided by adding a PreStart script for boinc-client systemd. Download two files with wget: sudo wget http://lhcathome.cern.ch/lhcathome/download/create-boinc-cgroup -O /sbin/create-boinc-cgroup sudo wget http://lhcathome.cern.ch/lhcathome/download/boinc-client.service -O /etc/systemd/system/boinc-client.service Then run the following commands to pick up the changes: sudo systemctl daemon-reload sudo systemctl restart boinc-client This will only suspend the application in memory. To suspend the application to disk so that it will survive the client exiting requires the container checkpointing feature. However, this is not currently available for Linux containers. Please post to the message boards if there are any issues. boinc.Service is running, but Theory task stopped with code 1 Loaded: loaded (/usr/lib/systemd/system/boinc-client.service; disabled; ve> Active: active (running) since Wed 2022-09-07 14:50:19 CEST; 2min 0s ago Docs: man:boinc(1) Main PID: 321273 (boinc) Tasks: 2 (limit: 60228) Memory: 5.0M CPU: 371ms CGroup: /system.slice/boinc-client.service └─321273 /usr/bin/boinc Sep 07 14:50:20 RYZENMPSQ boinc[321273]: 07-Sep-2022 14:50:20 [---] Checking pr> Sep 07 14:50:20 RYZENMPSQ boinc[321273]: 07-Sep-2022 14:50:20 [---] Using proxy This are the instructions from boinc-client.service in /usr/lib/systemd/system/boinc-client.service: [Unit] Description=Berkeley Open Infrastructure Network Computing Client Documentation=man:boinc(1) Wants=vboxdrv.service After=vboxdrv.service network-online.target [Service] Type=simple ProtectHome=true ProtectSystem=strict ProtectControlGroups=true ReadWritePaths=-/var/lib/boinc -/etc/boinc-client Nice=10 User=boinc WorkingDirectory=/var/lib/boinc ExecStart=/usr/bin/boinc ExecStop=/usr/bin/boinccmd --quit ExecReload=/usr/bin/boinccmd --read_cc_config ExecStopPost=/bin/rm -f lockfile IOSchedulingClass=idle # The following options prevent setuid root as they imply NoNewPrivileges=true # Since Atlas requires setuid root, they break Atlas # In order to improve security, if you're not using Atlas, # Add these options to the [Service] section of an override file using # sudo systemctl edit boinc-client.service #NoNewPrivileges=true #ProtectKernelModules=true #ProtectKernelTunables=true #RestrictRealtime=true #RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX #RestrictNamespaces=true #PrivateUsers=true #CapabilityBoundingSet= #MemoryDenyWriteExecute=true #PrivateTmp=true #Block X11 idle detection [Install] WantedBy=multi-user.target https://lhcathome.cern.ch/lhcathome/results.php?hostid=10813499 |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,451,734 RAC: 66,740 |
Theory native requires cgroups v1. Since recent Linux versions mostly switched to cgroups v2 Theory native will fail. Although it's not recommended, the following kernel parameter can be used to switch to cgroups v1 but it may break services that require cgroups v2. systemd.unified_cgroup_hierarchy=0 |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
This cgroup problem is a question for Laurence, have no interest to install this parameter. Thank you for the info. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,025,522 RAC: 9,726 |
boinc.service now for Atlas-Task: boinc-client.service - Berkeley Open Infrastructure Network Computing Client Loaded: loaded (/usr/lib/systemd/system/boinc-client.service; disabled; ve> Active: active (running) since Wed 2022-09-07 18:54:26 CEST; 1 day 14h ago Docs: man:boinc(1) Main PID: 2758 (boinc) Tasks: 43 (limit: 60228) Memory: 2.9G CPU: 9h 15min 53.707s CGroup: /system.slice/boinc-client.service ├─ 2758 /usr/bin/boinc ├─210347 ../../projects/lhcathome.cern.ch_lhcathome/wrapper_26015_> ├─210349 /bin/bash run_atlas --nthreads 1 ├─210350 /bin/bash run_atlas --nthreads 1 ├─210353 awk "{ print strftime(\"[%Y-%m-%d %H:%M:%S]\"), \$0; fflu> ├─210851 "Apptainer runtime parent" ├─210864 /usr/bin/sh start_atlas.sh ├─210904 /usr/bin/time -o /var/lib/boinc/slots/0/MmhMDmDYhq1nsSi4a> ├─210905 /bin/bash ./runpilot2-wrapper.sh -q BOINC_MCORE -j manage> ├─215405 /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/pytho> ├─218469 /bin/bash -c "export PANDA_RESOURCE='BOINC_MCORE';export > ├─218472 /bin/bash -c "cd /var/lib/boinc/slots/0/PanDA_Pilot-55857> ├─222293 prmon --pid 218469 --filename memory_monitor_output.txt -> ├─223356 python /cvmfs/atlas.cern.ch/repo/sw/software/21.0/AtlasOf> ├─223595 /bin/sh ./runwrapper.EVNTtoHITS.sh |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,451,734 RAC: 66,740 |
how do I apply the "systemd.unified_cgroup_hierarchy=0" workaround to my Ubuntu 22.04.2? The ArchLinux manual has a rather complete page that explains how to pass kernel parameters: https://wiki.archlinux.org/title/Kernel_parameters As already mentioned: Changing the default from cgroups v2 back to v1 may break other applications. Hence, do it on your own risk. |
©2025 CERN