Message boards : Theory Application : This gonna be long
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 82
Credit: 2,610,900
RAC: 9,474
Message 53360 - Posted: 2 Apr 2026, 9:16:59 UTC - in response to Message 53359.  

In reply to computezrmle's message of 2 Apr 2026:
As for your powheg-box

Those tasks don't update the log in the middle of processing but they usually succeed.
I suggest to let them run, even if they need a couple of days.


As for your pythia8

It reports "nevts=15000".
This points out the corresponding runspec didn't get enough results with "nevts=100000" from the last run(s).
Hence, nevts had been reduced (maybe more than once).
This can either point out
- it requires complex calculations that need much more time as expected (=> further reduction of nevts)
- a bug that causes all task to hang (=> needs a code revision at CERN)

Feel free to either let it run for up to 10 days or cancel it.


Thanks a lot for the explanations! I'll let them both run, see what happens. Will remember to be patient with powheg-box tasks in the future.
ID: 53360 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 82
Credit: 2,610,900
RAC: 9,474
Message 53363 - Posted: 2 Apr 2026, 14:24:53 UTC - in response to Message 53360.  

The powheg-box task resumed writing to the log and is now at around 60k events, advancing quite fast.
ID: 53363 · Report as offensive     Reply Quote
Glohr

Send message
Joined: 13 Jan 24
Posts: 48
Credit: 9,492,867
RAC: 18,964
Message 53417 - Posted: 15 Apr 2026, 9:00:37 UTC

Here's a sherpa job with characteristics that I can't remember seeing before, https://lhcathome.cern.ch/lhcathome/result.php?resultid=434823962
After about 16 hours running, runRivet.log is nearly 2 GB in size.
Task Properties:

Application Theory Simulation 301.00 (vbox64_theory)
Name Theory_2922-4899196-839
State Running
Received 4/14/2026 9:21:38 AM
Report deadline 4/24/2026 9:21:37 AM
Estimated computation size 3,600 GFLOPs
CPU time 15:13:04
CPU time since checkpoint 00:00:59
Elapsed time 15:53:25
Estimated time remaining 00:00:14
Fraction done 99.974%
Virtual memory size 50.67 MB
Working set size 667.57 MB
Directory slots/12
Process ID 58344
Progress rate 6.120% per hour
Executable vboxwrapper_26210_windows_x86_64.exe
Application Name Theory
Plan Class vbox64_theory
Log excerpts:
===> [runRivet] Tue Apr 14 04:51:24 PM UTC 2026 [boinc pp z1j 8000 - - sherpa 2.2.9 default 2000 839]
[...]
Process_Group::Differential(): Cross section is 'nan'.
Phase_Space_Integrator::AddPoint(): value = nan. Skip.
TChannelWeight: bad momenta!!!! -1 - 1 (-1)
1: (45.6739,4.35103e-16,9.88856e-18,45.6739)
2: (45.6739,0,0,-45.6739)
3: (91.3479,8.52651e-14,-1.89759e-12,-3.61067e-09)
4: (3.61059e-09,-8.483e-14,1.8976e-12,3.61059e-09)
TChannelWeight: bad momenta!!!! -1 - 1 (-1)
1: (45.6739,4.35103e-16,9.88856e-18,45.6739)
2: (45.6739,0,0,-45.6739)
3: (91.3479,8.52651e-14,-1.89759e-12,-3.61067e-09)
4: (3.61059e-09,-8.483e-14,1.8976e-12,3.61059e-09)
TChannelWeight: bad momenta!!!! -1 - 1 (1)
1: (45.6739,4.35103e-16,9.88856e-18,45.6739)
2: (45.6739,0,0,-45.6739)
3: (3.61059e-09,-8.483e-14,1.8976e-12,3.61059e-09)
4: (91.3479,8.52651e-14,-1.89759e-12,-3.61067e-09)
TChannelWeight: bad momenta!!!! -1 - 1 (1)
1: (45.6739,4.35103e-16,9.88856e-18,45.6739)
2: (45.6739,0,0,-45.6739)
3: (3.61059e-09,-8.483e-14,1.8976e-12,3.61059e-09)
4: (91.3479,8.52651e-14,-1.89759e-12,-3.61067e-09)
[...]

There are more that sort of message block than I would care to count.
There doesn't seem to be much chance of a successful outcome, so I'll (figuratively) put it out of its misery. Since the number of events is only 2000, it seems that this isn't the first time this has failed.
ID: 53417 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2753
Credit: 303,957,530
RAC: 111,332
Message 53418 - Posted: 15 Apr 2026, 9:19:31 UTC - in response to Message 53417.  

Good decision to cancel that task.
According to mcplots none of them succeeded so far:
run 	events 	attempts 	success 	failure 	unknown
pp z1j 8000 - - sherpa 2.2.9 default 	0 	56 	0 	13	43
ID: 53418 · Report as offensive     Reply Quote
TeeVeeEss

Send message
Joined: 3 Aug 11
Posts: 2
Credit: 773,488
RAC: 9,757
Message 53464 - Posted: 22 Apr 2026, 14:39:17 UTC

Another long runner, Theory_2922-4904088-748_1, https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=240311562
===> [runRivet] Sat Apr 18 15:21:12 UTC 2026 [boinc pp zinclusive 13000 - - sherpa 2.2.0 default 6000 748]

The Timed out- no reponse is already reported, but there is still progress reported in runRivet.log:
integration time:  ( 3d 22h 53m 5s elapsed / 2900d 1h 8m 52s left ) [14:32:52]   
7004.33 pb +- ( 3784.28 pb = 54.0277 % ) 83000000 ( 144430443 -> 57.5 % )
integration time:  ( 3d 22h 54m 31s elapsed / 2894d 22h 7m 30s left ) [14:34:19]   

Is there any scientific value in keep running this Task or should I abort it on my host?
ID: 53464 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2753
Credit: 303,957,530
RAC: 111,332
Message 53465 - Posted: 22 Apr 2026, 15:17:24 UTC - in response to Message 53464.  

The runspec tells you that #events had already been reduced from 100000 to 6000.
Nonetheless mcplots reports only lost tasks (9 of 9), which indicates they run much too long.
Your log snippet shows a estimated time left around 2894 days (~7.9 years!) for the integration phase.
Even if you let it run that long mcplots will mark the task as unknown (=lost).
ID: 53465 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 353
Credit: 6,786,362
RAC: 1,468
Message 53466 - Posted: 22 Apr 2026, 15:36:44 UTC

regarding Theory_2922-xxxxx tasks, the Boinc Manager shows just over an hour estimated for run time. I just aborted at 21 hour runtime task that was at 99,999% and not moving.
The clock counter showed an increase in run time each time it checkpointed. But no progress increase.

This is the second time now..whats going on?
One of my aborted tasks was also bombed before me and after me.
ID: 53466 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 804
Credit: 65,986,154
RAC: 28,212
Message 53467 - Posted: 22 Apr 2026, 17:59:52 UTC - in response to Message 53466.  

In reply to greg_be's message of 22 Apr 2026:
regarding Theory_2922-xxxxx tasks, the Boinc Manager shows just over an hour estimated for run time. I just aborted at 21 hour runtime task that was at 99,999% and not moving.
The clock counter showed an increase in run time each time it checkpointed. But no progress increase.

This is the second time now..whats going on?
One of my aborted tasks was also bombed before me and after me.

The task progress shown in Boinc Manager has nothing to do with the actual task progress as the progress is not reported from VBox back to Boinc manager. What you see is Boinc managers estimate based on previously run tasks. This can be wildly off because Theory tasks are so varied in run times.
ID: 53467 · Report as offensive     Reply Quote
TeeVeeEss

Send message
Joined: 3 Aug 11
Posts: 2
Credit: 773,488
RAC: 9,757
Message 53469 - Posted: 22 Apr 2026, 18:21:15 UTC - in response to Message 53465.  

In reply to computezrmle's message of 22 Apr 2026:
The runspec tells you that #events had already been reduced from 100000 to 6000.
Nonetheless mcplots reports only lost tasks (9 of 9), which indicates they run much too long.
Your log snippet shows a estimated time left around 2894 days (~7.9 years!) for the integration phase.
Even if you let it run that long mcplots will mark the task as unknown (=lost).
Thanks for the explanation, I will abort the task.

One more question: if the #events in a task is already reduced and the runtime is getting towards the BOINC-limit of 10 days, it should be aborted?
ID: 53469 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Theory Application : This gonna be long


©2026 CERN