Questions and Answers :
Windows :
Windows Theory Simulation v300.30 deadline miss
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
Windows 10, i7-4790K (2014), 32 GB RAM, M.2 2TB SSD, Gigabyte nVidia RTX 2060 mini OC 6GB, HP Elitedesk 800 G1 TWR. Using Squid http proxy. I'm having problems with Theory Simulation v300.30 (vbox64_theory) windows_x86_64tasks - they often overrun. When the task is downloaded, the BOINC client reports "Remaining" time is about 5 hours. When the task starts running the "Remaining" time is adjusted to show 9+ days. Sometimes the task completes before this time. Often they carry on running past the deadline. It's happening right now, for example: The local BOINC client shows - Project Status Elapsed Remaining (estimated) Deadline Application Name LHC@home Running 8d 00:07:18 2d 00:01:30 29/08/2024 Theory Simulation v300.30 (vbox64_theory) Theory_2743-2802274-370_0 It's still running. The LHC@home current tasks webpage - Task Work unit Computer Sent Time reported Status Run time CPU time Credit Application or deadline (sec) (sec) 413632263 224936523 10730901 19 Aug 2024 30 Aug 2024 Timed out - no response 0.00 0.00 --- Theory Simulation v300.30 (vbox64_theory) windows_x86_64 Clicking on the "Task" - Name Theory_2743-2802274-370_0 Workunit 224936523 Created 19 Aug 2024, 9:22:34 UTC Sent 19 Aug 2024, 16:08:06 UTC Report deadline 30 Aug 2024, 16:08:06 UTC Received --- Server state Over Outcome No reply Client state New Exit status 0 (0x00000000) Computer ID 10730901 Run time 0 sec CPU time 0 sec Validate state Initial Credit 0.00 Device peak FLOPS 5.05 GFLOPS Application version Theory Simulation v300.30 (vbox64_theory) windows_x86_64 Stderr output ![]() |
![]() Send message Joined: 28 Sep 04 Posts: 739 Credit: 50,636,662 RAC: 32,671 ![]() ![]() ![]() |
This is normal behavior for Virtual Box tasks. Virtual Box does not report the actual progress of the task back to Boinc Manager. Instead Boinc uses a simulated progress that it shows for the task. The initial value (in your case 5 hours) is some kind of average from previously finished Theory tasks. When the actual runtime exceeds this value, Boinc starts to use as estimate the cutoff time server has given to the task (10 days for Theory Tasks = 864000 s). Only place where you can monitor and estimate the actual task progress is inside a virtual box terminal. See more from Yeti's checklist : https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&sort_style=6&start=0 This was made primarily for ATlas tasks but is mostly valid for Theory too. ![]() |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
Thank you, Harri - I'm happy to browse through that checklist. Non-the-less, despite Windows not being able to tell what's going on inside a VM, it seems that BOINC can't tell when this task has exceeded its deadline... The task - Project Status Elapsed Remaining (estimated) Deadline Application Name LHC@home Running 8d 08:15:38 1d 15:54:31 29/08/2024 Theory Simulation v300.30 (vbox64_theory) Theory_2743-2802274-370_0is still running - way past its deadline. Should I "Abort" jobs that go past their deadline? Maybe this sheds some light... Pressing - ALT+F1 in the VM Console of the above task, showed some cvmfs and cranky stuff and - job: htmld=/shared/html/ job job: unpack exitcode=0 INFO: activated the work-around for ld: lrwxrwxrwx 1 0 0 15 Aug 19 21:02 /tmp/tmp.SAvOWJwc6Y/ld -> /usr/bin/ld.bfd 22:02:57 BST +01:00 2024-08-19: cranky: [INFO] ===> [runRivet] Mon Aug 19 21:02:56 UTC 2024 [boinc p p jets 8000 350 - pythia8 8.244 CP1-CR1 100000 370] And, pressing ALT+F2 The last bit - Pythia::next(): 82000 events have been generated 82000 events processed dumping histograms...So there's lots of work being done. (Rarely a Theory task runs without registering any CPU activity in Windows Task Manager...) Using my web browser to navigate to http://localhost:52218/ Test4Theory simulations Waiting for some nice figures to show you. Please, reload again in a few minutes Meanwhile you can check the logs (http://localhost:52218/logs/) The logs (after the [runRivet] and PYTHIA initialisation sections) - -------- End PYTHIA Event Listing ----------------------------------------------------------------------------------------------- Rivet.AnalysisHandler: INFO Only using nominal weight. Variation weights will be ignored. 0 events processed PYTHIA Warning in StringFragmentation::fragmentToJunction: bad convergence junction rest frame PYTHIA Error in StringFragmentation::fragment: stuck in joining PYTHIA Error in Pythia::next: hadronLevel failed; try again PYTHIA Warning in JunctionSplitting::SplitJunPairs: parallel junction state not allowed. PYTHIA Warning in JunctionSplitting::CheckColours: Not possible to split junctions; making new colours PYTHIA Warning in JunctionSplitting::CheckColours: Made a gluon colour singlet; redoing colours PYTHIA Warning in SimpleSpaceShower::pT2nextQCD: weight above unity 100 events processed dumping histograms... PYTHIA Error in MiniStringFragmentation::fragment: no 1- or 2-body state found above mass threshold PYTHIA Error in StringFragmentation::fragmentToJunction: caught in junction flavour loop 200 events processed dumping histograms... 300 events processed dumping histograms... PYTHIA Warning in MiniStringFragmentation::ministring2two: random axis needed to break tie 400 events processed dumping histograms... ... 900 events processed dumping histograms... PYTHIA Warning in SimpleSpaceShower::pT2nextQCD: small daughter PDF ... 1600 events processed PYTHIA Warning in StringFragmentation::finalRegion: random axis needed to break tie ... 2100 events processed PYTHIA Error in SimpleSpaceShower::pT2nearThreshold: stuck in loop ... 9800 events processed PYTHIA Warning in MultipartonInteractions::pTnext: weight above unity ... 12500 events processed PYTHIA Warning in Pythia::check: energy-momentum not quite conserved ... 13200 events processed PYTHIA Warning in TauDecays::decay: unknown correlated tau production, assuming from unpolarized photon ... 17800 events processed PYTHIA Error in BeamRemnants::setKinematics: kinematics construction failed ... ... 64700 events processed PYTHIA Warning in Pythia::check: not quite matched particle energy/momentum/mass ... Pythia::next(): 82000 events have been generated 82000 events processed dumping histograms... 82100 events processed 82200 events processed 82300 events processed ![]() |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
I'm speculating that the above warnings and errors are simulation telemetry and not program errors. But the task is just going to carry on 'til the end, oblivious that it's gone past its deadline - a waste of time... |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
System: OpenSuSE Tumbleweed, Intel i7-4790K, 32GB, 2TB SSD (The current date is 02/11/2024) The BOINC Manager shows these running tasks: Project Status Elapsed Remaining (estimated) Deadline Application Name LHC@home Running 6d 09:32:46 3d 11:55:01 01/11/2024 Theory Simulation 300.30 (vbox64_theory) Theory_2794-3266819-199_1 LHC@home Running 5d 23:40:10 3d 21:55:48 01/11/2024 Theory Simulation 300.30 (vbox64_theory) Theory_2794-3257411-175_1 The LHC@home All tasks webpage (https://lhcathome.cern.ch/lhcathome/results.php?userid=95350) shows this - Task Work unit Computer Sent Time reported Status Run time CPU time Credit Application or deadline (sec) (sec) 415137523 225914767 10860321 22 Oct 2024 2 Nov 2024 Timed out - no response 0.00 0.00 --- Theory Simulation v300.30 (vbox64_theory) x86_64-pc-linux-gnu 415142791 226066966 10860321 22 Oct 2024 27 Oct 2024 Completed and validated 311,018.69 235,746.40 719.95 Theory Simulation v300.30 (vbox64_theory) x86_64-pc-linux-gnu 415137790 225932422 10860321 22 Oct 2024 2 Nov 2024 Timed out - no response 0.00 0.00 --- Theory Simulation v300.30 (vbox64_theory) x86_64-pc-linux-gnu No doubt the timed-out tasks would have eventually provided a valid result - if they'd have had more time. Why not up the servers default response from 10 to 20 days? ![]() |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
This problem is not limited to Windows. So a more poignant post on the subject is here - https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6240 ![]() |
![]() ![]() Send message Joined: 9 Feb 08 Posts: 54 Credit: 1,339,875 RAC: 3,250 ![]() ![]() ![]() |
Also on the subject of very long running Theory tasks - This gonna be long - https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6251 And (currently) - CMS and Atlas have problems, but I'm getting a few Theory jobs that seem to be running. Have a little patience. They'll sort it out eventually. |
©2025 CERN