Message boards : Theory Application : New tasks all failing?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Lem Novantotto

Send message
Joined: 24 May 23
Posts: 48
Credit: 4,120,767
RAC: 155
Message 51386 - Posted: 12 Jan 2025, 23:37:26 UTC

They don't start. What's going on?
--
Bye, Lem
ID: 51386 · Report as offensive     Reply Quote
Lem Novantotto

Send message
Joined: 24 May 23
Posts: 48
Credit: 4,120,767
RAC: 155
Message 51387 - Posted: 13 Jan 2025, 0:08:04 UTC - in response to Message 51386.  

Started after almost an hour.
--
Bye, Lem
ID: 51387 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 26
Credit: 11,230,406
RAC: 11,024
Message 51388 - Posted: 13 Jan 2025, 9:07:34 UTC

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6174
powheg-box (AFAIK right now the Theory_2773) do have 2 steps and the first step does not print to the event log. At least the tasks are under a day so flying blind is not that problematic.
ID: 51388 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51389 - Posted: 13 Jan 2025, 9:42:01 UTC - in response to Message 51388.  

+1

Looks like the 1st step generates the events and writes them to something like
..../cernvm/shared/tmp/tmp.PdgYzMJF37/run-main/pwgevents.lhe
ID: 51389 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1446
Credit: 9,710,908
RAC: 377
Message 51390 - Posted: 13 Jan 2025, 10:07:36 UTC - in response to Message 51389.  

Looks like the 1st step generates the events ...
It not only looks like, it tells you during the first step: POWHEG: generating events
ID: 51390 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1863
Credit: 131,945,348
RAC: 111,877
Message 51391 - Posted: 13 Jan 2025, 10:59:21 UTC

Are the Herwig7 back ?
ID: 51391 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1446
Credit: 9,710,908
RAC: 377
Message 51393 - Posted: 13 Jan 2025, 15:36:42 UTC - in response to Message 51391.  
Last modified: 13 Jan 2025, 15:48:48 UTC

Are the Herwig7 back ?
No, it are Theory simulation tasks from batch 2773 (revision) with the powheg-box generator version r3744 and tune pthard2.
ID: 51393 · Report as offensive     Reply Quote
Lem Novantotto

Send message
Joined: 24 May 23
Posts: 48
Credit: 4,120,767
RAC: 155
Message 51394 - Posted: 13 Jan 2025, 15:44:56 UTC - in response to Message 51390.  

Looks like the 1st step generates the events ...
It not only looks like, it tells you during the first step: POWHEG: generating events


Sure, but yesterday nothing for an hour. 0% cpu: they looked like ATLAS tasks. Today they start immediately.
--
Bye, Lem
ID: 51394 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1446
Credit: 9,710,908
RAC: 377
Message 51396 - Posted: 13 Jan 2025, 20:38:24 UTC - in response to Message 51390.  

Looks like the 1st step generates the events ...
It not only looks like, it tells you during the first step: POWHEG: generating events
Returned one: 10 hours of generation of the 100,000 events and 1.5 hours to process those events. https://lhcathome.cern.ch/lhcathome/result.php?resultid=418770726
ID: 51396 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 89
Credit: 9,595,713
RAC: 10,084
Message 51435 - Posted: 21 Jan 2025, 10:08:37 UTC

I decided to try Theory again on my Linux box and they are all failing, example

https://lhcathome.cern.ch/lhcathome/result.php?resultid=418956018

Any idea what might be causing it?
ID: 51435 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2629
Credit: 268,527,686
RAC: 133,033
Message 51437 - Posted: 21 Jan 2025, 13:35:22 UTC - in response to Message 51435.  

Same as described here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6280&postid=51436

Hint:
Next time you post a link, be nice and wrap it in URL tags.
ID: 51437 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 89
Credit: 9,595,713
RAC: 10,084
Message 51438 - Posted: 21 Jan 2025, 16:48:46 UTC - in response to Message 51437.  

Thanks, made the changes and it seems to be working now, will keep an eye on the tasks
ID: 51438 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 89
Credit: 9,595,713
RAC: 10,084
Message 51493 - Posted: 3 Feb 2025, 9:46:55 UTC

All was fine, now my Theory and Atlas tasks are failing again, this project requires more maintenance then a small child lol

https://lhcathome.cern.ch/lhcathome/result.php?resultid=419241776 Theory

https://lhcathome.cern.ch/lhcathome/result.php?resultid=419209996 Atlas
ID: 51493 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 89
Credit: 9,595,713
RAC: 10,084
Message 51494 - Posted: 3 Feb 2025, 10:04:44 UTC

I have changed the proxy to direct in the config file, it probes fine but still gives config errors?

:~$ sudo cvmfs_config probe
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
Probing /cvmfs/cernvm-prod.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/alice.cern.ch... OK
~$ sudo cvmfs_config chksetup
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/atlas-condb.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/cernvm-prod.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/sft.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
Warning: failed to access http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/alice.cern.ch/.cvmfspublished through proxy DIRECT
Warning: failed to use Geo-API with s1ihep-cvmfs.openhtc.io
ID: 51494 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51495 - Posted: 3 Feb 2025, 13:18:47 UTC - in response to Message 51494.  
Last modified: 3 Feb 2025, 13:45:16 UTC

Hi, I had a similar situation after a failed Theory task. CVMFS wasn't available any more. Had to reboot to get it working again.
rgds, Anne.

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
10:01:11 (3259305): wrapper (7.15.26016): starting
10:01:11 (3259305): wrapper (7.15.26016): starting
10:01:11 (3259305): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.1.4 ()
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Detected Theory App
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] This application must have permanent access to
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] online repositories via a local CVMFS service.
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] It supports suspend/resume if a couple of
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] requirements are fulfilled.
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Most important:
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - init process is systemd
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - cgroups v2 is enabled and 'freezer' is available
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - the user running this application is a member of the 'boinc' group
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - sudo is at least version 1.9.10
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] - sudoer file provided by LHC@home is installed
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Checking local requirements.
10:01:11 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Found Sudo-Version 1.9.15p5.
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO]
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO]
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO]
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO]
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Found a local runc version 1.1.12-0ubuntu3.1.
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [ERROR] Major requirements are missing. Can't run this task.
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Early shutdown initiated due to previous errors.
10:01:12 CET +01:00 2025-02-02: cranky-0.1.4: [INFO] Cleanup will take a few minutes...
10:14:00 (3259305): cranky exited; CPU time 0.176978
10:14:00 (3259305): app exit status: 0xce
10:14:00 (3259305): called boinc_finish(195)
ID: 51495 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 17 Aug 17
Posts: 89
Credit: 9,595,713
RAC: 10,084
Message 51496 - Posted: 3 Feb 2025, 17:26:11 UTC - in response to Message 51495.  

Cheers, changing the proxy to DIRECT only seems to have fixed it for now, not sure why it worked before though? maybe an update replaced the file?
Ill keep an eye on it and reboot the box if it acts up again :)
ID: 51496 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2262
Credit: 175,581,097
RAC: 121
Message 51497 - Posted: 4 Feb 2025, 8:07:00 UTC

@Anne Havinga
Is it possible to make your Computer visible in Prefs?
ID: 51497 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51498 - Posted: 4 Feb 2025, 14:54:58 UTC - in response to Message 51497.  
Last modified: 4 Feb 2025, 14:57:45 UTC

I just did.
The one with the failed Theory task is :https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10850852.
And a strange failure with a Atlas task on https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10819968.
On this one I have a other problem, despite allowing BOINC only to use 1 CPU the Atlas task takes both configured CPU's on this VM.
ID: 51498 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2262
Credit: 175,581,097
RAC: 121
Message 51499 - Posted: 4 Feb 2025, 18:59:19 UTC - in response to Message 51498.  

This aborted Atlas-Task show this lines, with problem of timestamp.
[2025-01-31 06:06:37] start_atlas.sh: line 350: date: command not found
[2025-01-31 06:06:37] + ACCOUNTING_ENDTIME=
[2025-01-31 06:06:37] ++ date -d '1970-01-01 UTC 1738273395 seconds' +%Y%m%d%H%M%SZ

Theory Tasks had a problem at this time 2 Feb 2025, 9:32:22 UTC with your network and errored out in starting phase.

After this time Theory Tasks finished correct .
ID: 51499 · Report as offensive     Reply Quote
Anne Havinga

Send message
Joined: 4 Mar 20
Posts: 13
Credit: 5,760,632
RAC: 6,163
Message 51500 - Posted: 4 Feb 2025, 20:01:30 UTC - in response to Message 51499.  

@ maeax
I did notice that myself off course. The point with the machine running the Theory task is the fact that the cvmfs connection to the cern servers was gone. The stderr.txt was identical to the stderr.txt from the Theory task Ryan Munro metioned in message 51438. You can see no output from probing the cvmfs mountpoints and so I mentioned I had some similar failures.
At 2 Feb 2025, 9:32:22 UTC I suspended the task still waiting and aborted the running task as it would fail soon. The tried to get cvmfs running again but without success. As this is a VM dedicated to running boinc, the easiest way to get it resolved was by rebooting the VM.
Thank's anyway.
ID: 51500 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : New tasks all failing?


©2025 CERN