Message boards :
Theory Application :
New tasks all failing?
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 May 23 Posts: 52 Credit: 4,469,843 RAC: 0 ![]() ![]() |
They don't start. What's going on? -- Bye, Lem |
Send message Joined: 24 May 23 Posts: 52 Credit: 4,469,843 RAC: 0 ![]() ![]() |
Started after almost an hour. -- Bye, Lem |
Send message Joined: 4 Mar 17 Posts: 32 Credit: 12,193,862 RAC: 9,414 ![]() ![]() ![]() |
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6174 powheg-box (AFAIK right now the Theory_2773) do have 2 steps and the first step does not print to the event log. At least the tasks are under a day so flying blind is not that problematic. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,887,455 RAC: 54,539 ![]() ![]() |
+1 Looks like the 1st step generates the events and writes them to something like ..../cernvm/shared/tmp/tmp.PdgYzMJF37/run-main/pwgevents.lhe |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
Looks like the 1st step generates the events ...It not only looks like, it tells you during the first step: POWHEG: generating events |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,948,171 RAC: 82,479 ![]() ![]() ![]() |
Are the Herwig7 back ? |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
Are the Herwig7 back ?No, it are Theory simulation tasks from batch 2773 (revision) with the powheg-box generator version r3744 and tune pthard2. |
Send message Joined: 24 May 23 Posts: 52 Credit: 4,469,843 RAC: 0 ![]() ![]() |
Looks like the 1st step generates the events ...It not only looks like, it tells you during the first step: POWHEG: generating events Sure, but yesterday nothing for an hour. 0% cpu: they looked like ATLAS tasks. Today they start immediately. -- Bye, Lem |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
Returned one: 10 hours of generation of the 100,000 events and 1.5 hours to process those events. https://lhcathome.cern.ch/lhcathome/result.php?resultid=418770726Looks like the 1st step generates the events ...It not only looks like, it tells you during the first step: POWHEG: generating events |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,856,563 RAC: 11,312 ![]() ![]() ![]() |
I decided to try Theory again on my Linux box and they are all failing, example https://lhcathome.cern.ch/lhcathome/result.php?resultid=418956018 Any idea what might be causing it? |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,887,455 RAC: 54,539 ![]() ![]() |
Same as described here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6280&postid=51436 Hint: Next time you post a link, be nice and wrap it in URL tags. |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,856,563 RAC: 11,312 ![]() ![]() ![]() |
Thanks, made the changes and it seems to be working now, will keep an eye on the tasks |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,856,563 RAC: 11,312 ![]() ![]() ![]() |
All was fine, now my Theory and Atlas tasks are failing again, this project requires more maintenance then a small child lol https://lhcathome.cern.ch/lhcathome/result.php?resultid=419241776 Theory https://lhcathome.cern.ch/lhcathome/result.php?resultid=419209996 Atlas |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,856,563 RAC: 11,312 ![]() ![]() ![]() |
I have changed the proxy to direct in the config file, it probes fine but still gives config errors? :~$ sudo cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK Probing /cvmfs/cernvm-prod.cern.ch... OK Probing /cvmfs/sft.cern.ch... OK Probing /cvmfs/alice.cern.ch... OK ~$ sudo cvmfs_config chksetup Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/atlas-condb.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/cernvm-prod.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/sft.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/alice.cern.ch/.cvmfspublished through proxy DIRECT Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io |
Send message Joined: 4 Mar 20 Posts: 14 Credit: 6,508,131 RAC: 5,638 ![]() ![]() ![]() |
Hi, I had a similar situation after a failed Theory task. CVMFS wasn't available any more. Had to reboot to get it working again. rgds, Anne. <core_client_version>8.0.4</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 10:01:11 (3259305): wrapper (7.15.26016): starting 10:01:11 (3259305): wrapper (7.15.26016): starting 10:01:11 (3259305): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.1.4 () 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Detected Theory App 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] This application must have permanent access to 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] online repositories via a local CVMFS service. 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] It supports suspend/resume if a couple of 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] requirements are fulfilled. 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Most important: 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - init process is systemd 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - cgroups v2 is enabled and 'freezer' is available 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - the user running this application is a member of the 'boinc' group 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - sudo is at least version 1.9.10 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - sudoer file provided by LHC@home is installed 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Checking local requirements. 10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.15p5. 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Found a local runc version 1.1.12-0ubuntu3.1. 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [ERROR] Major requirements are missing. Can't run this task. 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Early shutdown initiated due to previous errors. 10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Cleanup will take a few minutes... 10:14:00 (3259305): cranky exited; CPU time 0.176978 10:14:00 (3259305): app exit status: 0xce 10:14:00 (3259305): called boinc_finish(195) |
Send message Joined: 17 Aug 17 Posts: 124 Credit: 10,856,563 RAC: 11,312 ![]() ![]() ![]() |
Cheers, changing the proxy to DIRECT only seems to have fixed it for now, not sure why it worked before though? maybe an update replaced the file? Ill keep an eye on it and reboot the box if it acts up again :) |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
@Anne Havinga Is it possible to make your Computer visible in Prefs? |
Send message Joined: 4 Mar 20 Posts: 14 Credit: 6,508,131 RAC: 5,638 ![]() ![]() ![]() |
I just did. The one with the failed Theory task is :https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10850852. And a strange failure with a Atlas task on https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10819968. On this one I have a other problem, despite allowing BOINC only to use 1 CPU the Atlas task takes both configured CPU's on this VM. |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
This aborted Atlas-Task show this lines, with problem of timestamp. [2025-01-31 06:06:37] start_atlas.sh: line 350: date: command not found [2025-01-31 06:06:37] + ACCOUNTING_ENDTIME= [2025-01-31 06:06:37] ++ date -d '1970-01-01 UTC 1738273395 seconds' +%Y%m%d%H%M%SZ Theory Tasks had a problem at this time 2 Feb 2025, 9:32:22 UTC with your network and errored out in starting phase. After this time Theory Tasks finished correct . |
Send message Joined: 4 Mar 20 Posts: 14 Credit: 6,508,131 RAC: 5,638 ![]() ![]() ![]() |
@ maeax I did notice that myself off course. The point with the machine running the Theory task is the fact that the cvmfs connection to the cern servers was gone. The stderr.txt was identical to the stderr.txt from the Theory task Ryan Munro metioned in message 51438. You can see no output from probing the cvmfs mountpoints and so I mentioned I had some similar failures. At 2 Feb 2025, 9:32:22 UTC I suspended the task still waiting and aborted the running task as it would fail soon. The tried to get cvmfs running again but without success. As this is a VM dedicated to running boinc, the easiest way to get it resolved was by rebooting the VM. Thank's anyway. |
©2025 CERN