Message boards :
Theory Application :
(Native) Theory - Sherpa looooooong runners
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
===> [runRivet] Sun Mar 24 08:22:01 UTC 2019 [boinc pp jets 13000 250,-,4160 - sherpa 2.2.2 default 2000 34] Over 9 hours in and now in the 4th full optimization and integration phase. Suspicious is the time left going up and down. full optimization: ( 2h 19m 3s elapsed / 1h 16m 9s left ) [12:21:36] full optimization: ( 2h 22m 31s elapsed / 1h 8m 2s left ) [12:25:08] integration time: ( 2h 26m 1s elapsed / 1h 19s left ) [12:28:43] integration time: ( 2h 29m 35s elapsed / 52m 59s left ) [12:32:23] integration time: ( 2h 36m 41s elapsed / 39m 10s left ) [12:39:39] integration time: ( 2h 43m 48s elapsed / 26m 19s left ) [12:46:54] integration time: ( 2h 50m 52s elapsed / 21m 7s left ) [12:54:08] integration time: ( 2h 57m 59s elapsed / 15m 43s left ) [13:01:25] integration time: ( 3h 4m 44s elapsed / 1h 52m 1s left ) [13:08:19] integration time: ( 3h 10m 21s elapsed / 1h 31m 25s left ) [13:14:04] integration time: ( 3h 16m 1s elapsed / 1h 33m 27s left ) [13:19:51] integration time: ( 3h 22m 51s elapsed / 1h 15m 18s left ) [13:26:51] integration time: ( 3h 29m 58s elapsed / 1h 39m 58s left ) [13:34:09] integration time: ( 3h 37m 5s elapsed / 1h 29m 4s left ) [13:41:26] integration time: ( 3h 44m 16s elapsed / 1h 11m 7s left ) [13:48:47] integration time: ( 3h 51m 26s elapsed / 56m 44s left ) [13:56:07] integration time: ( 3h 58m 35s elapsed / 43m 58s left ) [14:03:25] integration time: ( 4h 5m 19s elapsed / 30m 44s left ) [14:10:20] integration time: ( 4h 11m 37s elapsed / 19m 48s left ) [14:16:48] integration time: ( 4h 17m 31s elapsed / 1h 28m 46s left ) [14:22:50] integration time: ( 4h 24m 30s elapsed / 1h 13m 7s left ) [14:29:59] integration time: ( 4h 32m 31s elapsed / 59m 12s left ) [14:38:10] integration time: ( 4h 40m 30s elapsed / 45m 39s left ) [14:46:20] integration time: ( 4h 48m 29s elapsed / 31m 42s left ) [14:54:31] integration time: ( 4h 56m 29s elapsed / 17m 39s left ) [15:02:42] integration time: ( 5h 4m 23s elapsed / 13m 19s left ) [15:10:47] integration time: ( 5h 12m 9s elapsed / 1h 28m 56s left ) [15:18:43] integration time: ( 5h 19m 15s elapsed / 1h 18m 45s left ) [15:26:01] integration time: ( 5h 27m 4s elapsed / 1h 3m 39s left ) [15:34:00] integration time: ( 5h 39m 24s elapsed / 4h 7m 9s left ) [15:46:43] integration time: ( 5h 52m 38s elapsed / 3h 59m 13s left ) [16:00:18] integration time: ( 6h 4m 9s elapsed / 3h 42m 11s left ) [16:12:10] integration time: ( 6h 15m 21s elapsed / 3h 24m 1s left ) [16:23:38] integration time: ( 6h 25m 13s elapsed / 3h 4m 58s left ) [16:33:43] integration time: ( 6h 34m 59s elapsed / 2h 47m 39s left ) [16:43:40] integration time: ( 6h 44m 51s elapsed / 2h 35m 21s left ) [16:53:43] integration time: ( 6h 53m 36s elapsed / 2h 16m 53s left ) [17:02:39] integration time: ( 7h 2m 28s elapsed / 2h 57m 35s left ) [17:11:40] integration time: ( 7h 14m 3s elapsed / 2h 40m 32s left ) [17:23:32] |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
The yesterday mentioned job is still running and as long as the left integration time is not only going up, there is hope. integration time: ( 21h 59m 41s elapsed / 2d 21h 12m 39s left ) [08:29:40] integration time: ( 22h 12m 5s elapsed / 2d 20h 32m 32s left ) [08:42:25] integration time: ( 22h 24m 29s elapsed / 2d 19h 46m 59s left ) [08:55:11] In the VBox-version this task would already have been killed due to the 18hrs time limit. |
Send message Joined: 29 Jun 18 Posts: 6 Credit: 5,314,428 RAC: 707 |
I have what appears to be my first native Sherpa job. It's been running 100% on one core for 1d, 20h. and is 11h from its deadline. I can't find a running.log on my system, but runRivet.log shows: ===> [runRivet] Sun Mar 24 06:22:54 UTC 2019 [boinc pp winclusive 7000 -,-,10 - sherpa 2.1.1 default 4000 34] the last lines of that file are: Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). Updating display... Display update finished (0 histograms, 0 events). and the last line before those is ... Initialized the Beam_Remnant_Handler. Hadron_Decay_Map::Read: Initializing HadronDecays.dat. This may take some time. Initialized the Hadron_Decay_Handler, Decay model = Hadrons Initialized the Soft_Photon_Handler. Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal) Starting the calculation at 06:23:26. Lean back and enjoy ... . Updating display... Display update finished (0 histograms, 0 events). ... I don't know anything about sherpa jobs.. is it working or not?[/code][/quote] |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Display update finished (0 histograms, 0 events). runRivet.log is the right file and the task itself is working. The sherpa job is most certainly a fail. The giveaway is that it's stuck at 0 histograms and 0 events. All you can do is abort the task. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
All you can do is abort the task. This is the simple answer and not good for the Science. Have a Sherpa running for 10K Minutes now And show every minute a answer in runRivet.log. In Theory-Thread is a link for the Sherpa Documentation. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
All you can do is abort the task. Really? If the task fails (and his task most certainly will), it won't even upload a result. No result = no science. If he had aborted the task 24 hours ago he could have received a pythia, herwig, <whatever> that is far more likely to succeed and do some worthwhile science. It has been stated that they learn something even if the job fails. Really? What do they learn? All they learn is that the job failed. They can learn that fact from the 2 failures from the 2 wingmen. It's the principle of lost opportunity cost... every failed job is a lost opportunity to do some useful science. Have a Sherpa running for 10K Minutes now And show every minute a answer in runRivet.log.You got lucky. In Theory-Thread is a link for the Sherpa Documentation.Please post that link, I couldn't find it but I would like to read it. |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
|
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
Still running: ===> [runRivet] Sun Mar 24 08:22:01 UTC 2019 [boinc pp jets 13000 250,-,4160 - sherpa 2.2.2 default 2000 34] and time left down again to 0s. 2.56608e-13 pb +- ( 5.25828e-15 pb = 2.04915 % ) 6820000 ( 108708125 -> 7.1 % ) integration time: ( 2d 47m 7s elapsed / 0s left ) [11:57:35]What's next? |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Still running: According to the Sherpa 2.1.0 manual, "Sherpa will then move on to integrate the other processes specified in the run card." And "When the integration is complete, the event generation will start." |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
Suddenly it was over very quickly: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219941269 The whole runRivet.log available, but displaying here the last part after integration. 2.58077e-13 pb +- ( 5.15428e-15 pb = 1.99719 % ) 7100000 ( 112624285 -> 7.1 % ) integration time: ( 2d 3h 13m 47s elapsed / 0s left ) [14:27:14] 2_4__j__j__j__j__j__j__NQ_0-4 : 2.58077e-13 pb +- ( 5.15428e-15 pb = 1.99719 % ) exp. eff: 3.24641e-05 % reduce max for 2_4__j__j__j__j__j__j__NQ_0-4 to 0.218194 ( eps = 0.001 ) Output_Phase::Output_Phase(): Set output interval 1000000000 events. ---------------------------------------------------------- -- SHERPA generates events with the following structure -- ---------------------------------------------------------- Perturbative : Signal_Processes Perturbative : Hard_Decays Perturbative : Jet_Evolution:CSS Perturbative : Lepton_FS_QED_Corrections:Photons Perturbative : Multiple_Interactions:Amisic Perturbative : Minimum_Bias:Off Hadronization : Beam_Remnants Hadronization : Hadronization:Ahadic Hadronization : Hadron_Decays Analysis : HepMC2 --------------------------------------------------------- Event 1 ( 0s elapsed / 1m 59s left ) -> ETA: Tue Mar 26 14:29 XS = 13.1493 pb +- ( 13.1493 pb = 100 % ) #-------------------------------------------------------------------------- # FastJet release 3.0.3 # M. Cacciari, G.P. Salam and G. Soyez # A software package for jet finding and analysis at colliders # http://fastjet.fr # # Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package # for scientific work and optionally PLB641(2006)57 [hep-ph/0512210]. # # FastJet is provided without warranty under the terms of the GNU GPLv2. # It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code # and 3rd party plugin jet algorithms. See COPYING file for details. #-------------------------------------------------------------------------- Event 2 ( 0s elapsed / 2m 39s left ) -> ETA: Tue Mar 26 14:29 XS = 13.5889 pb +- ( 9.59415 pb = 70.6 % ) Event 3 ( 0s elapsed / 3m 6s left ) -> ETA: Tue Mar 26 14:30 XS = 14.4529 pb +- ( 8.32632 pb = 57.61 % ) Event 4 ( 0s elapsed / 3m 14s left ) -> ETA: Tue Mar 26 14:30 XS = 15.2066 pb +- ( 7.58387 pb = 49.87 % ) Event 5 ( 0s elapsed / 3m 7s left ) -> ETA: Tue Mar 26 14:30 XS = 17.012 pb +- ( 7.58477 pb = 44.58 % ) Event 6 ( 0s elapsed / 3m 12s left ) -> ETA: Tue Mar 26 14:30 XS = 16.4568 pb +- ( 6.69778 pb = 40.69 % ) Event 7 ( 0s elapsed / 3m 5s left ) -> ETA: Tue Mar 26 14:30 XS = 15.9021 pb +- ( 5.99205 pb = 37.68 % ) Event 8 ( 0s elapsed / 2m 56s left ) -> ETA: Tue Mar 26 14:30 XS = 14.4568 pb +- ( 5.09674 pb = 35.25 % ) Event 9 ( 0s elapsed / 2m 43s left ) -> ETA: Tue Mar 26 14:29 XS = 15.5956 pb +- ( 5.18239 pb = 33.22 % ) Event 10 ( 0s elapsed / 2m 39s left ) -> ETA: Tue Mar 26 14:29 XS = 15.7089 pb +- ( 4.95184 pb = 31.52 % ) Event 20 ( 1s elapsed / 2m 57s left ) -> ETA: Tue Mar 26 14:30 XS = 16.6757 pb +- ( 3.71556 pb = 22.28 % ) Event 30 ( 2s elapsed / 2m 48s left ) -> ETA: Tue Mar 26 14:30 XS = 13.714 pb +- ( 2.49638 pb = 18.2 % ) Event 40 ( 3s elapsed / 2m 47s left ) -> ETA: Tue Mar 26 14:30 XS = 12.8102 pb +- ( 2.0198 pb = 15.76 % ) Event 50 ( 4s elapsed / 2m 47s left ) -> ETA: Tue Mar 26 14:30 XS = 12.8264 pb +- ( 1.80881 pb = 14.1 % ) Event 60 ( 5s elapsed / 2m 50s left ) -> ETA: Tue Mar 26 14:30 XS = 13.1017 pb +- ( 1.68655 pb = 12.87 % ) Event 70 ( 6s elapsed / 2m 50s left ) -> ETA: Tue Mar 26 14:30 XS = 13.4696 pb +- ( 1.60514 pb = 11.91 % ) Event 80 ( 7s elapsed / 2m 53s left ) -> ETA: Tue Mar 26 14:30 XS = 13.0581 pb +- ( 1.45572 pb = 11.14 % ) Event 90 ( 8s elapsed / 2m 52s left ) -> ETA: Tue Mar 26 14:30 XS = 12.6936 pb +- ( 1.36468 pb = 10.75 % ) Event 100 ( 9s elapsed / 2m 52s left ) -> ETA: Tue Mar 26 14:30 XS = 12.6198 pb +- ( 1.2842 pb = 10.17 % ) 100 events processed dumping histograms... Updating display... Event 200 ( 18s elapsed / 2m 46s left ) -> ETA: Tue Mar 26 14:30 XS = 11.4647 pb +- ( 0.842384 pb = 7.34 % ) 200 events processed dumping histograms... Display update finished (6 histograms, 100 events). Event 300 ( 27s elapsed / 2m 37s left ) -> ETA: Tue Mar 26 14:30 XS = 11.5149 pb +- ( 0.700558 pb = 6.08 % ) 300 events processed dumping histograms... Event 400 ( 36s elapsed / 2m 27s left ) -> ETA: Tue Mar 26 14:30 XS = 12.0125 pb +- ( 0.624036 pb = 5.19 % ) 400 events processed dumping histograms... Event 500 ( 46s elapsed / 2m 18s left ) -> ETA: Tue Mar 26 14:30 XS = 12.0868 pb +- ( 0.5616 pb = 4.64 % ) 500 events processed dumping histograms... Event 600 ( 54s elapsed / 2m 7s left ) -> ETA: Tue Mar 26 14:30 XS = 12.4103 pb +- ( 0.524525 pb = 4.22 % ) 600 events processed dumping histograms... Event 700 ( 1m 4s elapsed / 1m 59s left ) -> ETA: Tue Mar 26 14:30 XS = 12.3402 pb +- ( 0.480327 pb = 3.89 % ) 700 events processed dumping histograms... Event 800 ( 1m 14s elapsed / 1m 51s left ) -> ETA: Tue Mar 26 14:30 XS = 12.2777 pb +- ( 0.44522 pb = 3.62 % ) 800 events processed dumping histograms... Updating display... Display update finished (6 histograms, 800 events). Event 900 ( 1m 23s elapsed / 1m 42s left ) -> ETA: Tue Mar 26 14:30 XS = 12.3343 pb +- ( 0.423254 pb = 3.43 % ) 900 events processed dumping histograms... Event 1000 ( 1m 33s elapsed / 1m 33s left ) -> ETA: Tue Mar 26 14:30 XS = 12.2405 pb +- ( 0.39885 pb = 3.25 % ) 1000 events processed dumping histograms... 1100 events processed 1200 events processed 1300 events processed 1400 events processed 1500 events processed Updating display... Display update finished (6 histograms, 1000 events). 1600 events processed 1700 events processed 1800 events processed 1900 events processed Event 2000 ( 184 s total ) = 939386 evts/day In Event_Handler::Finish : Summarizing the run may take some time. +------------------------------------------------------+ | | | Total XS is 12.2115 pb +- ( 0.280779 pb = 2.29 % ) | | | +------------------------------------------------------+ Return_Value::PrintStatistics(): Statistics { Generated events: 2000 New events { From "Jet_Evolution:CSS": 915 (8249) -> 11 % } Retried events { From "Beam_Remnants": 1 (2001) -> 0 % From "Jet_Evolution:CSS": 40 (8249) -> 0.4 % } Retried phases { From "Hadron_Decay_Handler::RejectExclusiveChannelsFromFragmentation": 415 (0) -> 415. } Retried methods { From "Decay_Channel::GenerateKinematics": 1 (485783) -> 0 % } } ------------------------------------------------------------------------ Please cite the publications listed in 'Sherpa_References.tex'. Extract the bibtex list by running 'get_bibtex Sherpa_References.tex' or email the file to 'slaclib2@slac.stanford.edu', subject 'generate'. ------------------------------------------------------------------------ Time: 2d 5h 18m 58s on Tue Mar 26 14:30:22 2019 (User: 2d 4h 47m 27s, System: 15m 10s, Children User: 0s, Children System: 0s) Thanks for using LHAPDF 6.1.6. Please make sure to cite the paper: Eur.Phys.J. C75 (2015) 3, 132 (http://arxiv.org/abs/1412.7420) 2000 events processed dumping histograms... Rivet.Analysis.Handler: INFO Finalising analyses Rivet.Analysis.CMS_2017_I1519995: WARN Skipping histo with null area /CMS_2017_I1519995/d03-x01-y01 Rivet.Analysis.CMS_2017_I1519995: WARN Skipping histo with null area /CMS_2017_I1519995/d04-x01-y01 Rivet.Analysis.CMS_2017_I1519995: WARN Skipping histo with null area /CMS_2017_I1519995/d06-x01-y01 Rivet.Analysis.Handler: INFO Processed 2000 events The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694) Generator run finished successfully Processing histograms... input = /shared/tmp/tmp.0FlqnMaf1B/flat output = /shared ./runRivet.sh: line 742: 202 Killed display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune" (wd: /shared) mc: CMS_2017_I1519995_d02-x01-y01.dat -> /shared/dat/pp/jets/dijet_chi/cms2017-m4200/13000/sherpa/2.2.2/default.dat data: REF_CMS_2017_I1519995_d02-x01-y01.dat -> /shared/dat/pp/jets/dijet_chi/cms2017-m4200/13000/CMS_2017_I1519995.dat Disk usage: 33836 Kb CPU usage: 193426 s Clean tmp ... Run finished successfully |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Suddenly it was over very quickly: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219941269 Wonderful! Your point is? |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
2 days 5 hours runtime- yes we need some time for this "small" tasks. Congratulation Crystal. Have also one Sherpa with... |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
2 days 5 hours runtime- yes we need some time for this "small" tasks. Good point. Modifying my watchdog script to abort sherpa if it's configured for more than 2K events. Or maybe the limit should be 4K events? Or maybe allow the user to select the limit. |
Send message Joined: 14 Jan 10 Posts: 1411 Credit: 9,433,926 RAC: 11,615 |
Out of 1,063,722 jobs (all kinds) from batch 2279 % Run time 66.9350% < 2hrs 26.6682% > 2hrs <6hrs 5.6714% > 6hrs <12hrs 0.7038% >12hrs <18hrs 0.0172% >18hrs <24hrs 0.0043% >24hrs |
Send message Joined: 2 May 07 Posts: 2228 Credit: 173,798,559 RAC: 18,443 |
Bronco, stop this writing from your sight about Sherpa. You stay alone with this theory!! |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Bronco, I don't think you really understand my theory. Until you understand you should stop telling me when/how to post. If that doesn't suit you then take a long hard suck on my ass. |
Send message Joined: 11 Jan 12 Posts: 5 Credit: 177,593 RAC: 0 |
Two days ago I have aborted two tasks which were running for almost 3 days without completion. I wonder how long these could take to complete if it would ever happen. https://lhcathome.cern.ch/lhcathome/result.php?resultid=219678168 https://lhcathome.cern.ch/lhcathome/result.php?resultid=219671436 |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,915,653 RAC: 128,265 |
Two days ago I have aborted two tasks which were running for almost 3 days without completion. I wonder how long these could take to complete if it would ever happen. Theory native tasks must not be interrupted (Client restart, reboot, etc.). If you do so the will always start from the scratch 13:55:32 (24586): wrapper (7.15.26016): starting 13:55:32 (24586): wrapper (7.15.26016): starting 19:17:11 (12073): wrapper (7.15.26016): starting 19:17:11 (12073): wrapper (7.15.26016): starting 19:22:57 (12311): wrapper (7.15.26016): starting 19:22:57 (12311): wrapper (7.15.26016): starting |
Send message Joined: 11 Jan 12 Posts: 5 Credit: 177,593 RAC: 0 |
Two days ago I have aborted two tasks which were running for almost 3 days without completion. I wonder how long these could take to complete if it would ever happen. Even with these reboots WUs were running for a long time uninterrupted before I aborted them - 1st one for 2d 17h, 2nd for 2d 9h. This machine usually is up for few months, I only reboot it after kernel update. |
Send message Joined: 24 Nov 06 Posts: 76 Credit: 7,953,478 RAC: 106 |
So we should abort long running tasks? I can do that. What is the cut-off line? 3 hours? 5? Or? |
©2024 CERN