How long may Native-Theory-Tasks run

Author	Message
Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,312,639 RAC: 5,052	Message 48150 - Posted: 30 May 2023, 9:09:45 UTC Last modified: 30 May 2023, 9:10:03 UTC I have opened my Native-Atlas-Clients for Native-Theory and see wide varyiung runtimes. From 00:20 hours to 02:45 hours seem to be fine, but sometimes I see runtimes from 20:00 or even more hours, sometimes with 99% CPU-Cycle, sometimes with no CPU-Cycle. Can I see, if the tasks are alive and doing fine or should I abort them if longer than XX:00 Hours ? Supporting BOINC, a great concept ! ID: 48150 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 48151 - Posted: 30 May 2023, 9:29:44 UTC - in response to Message 48150. Call this page: http://mcplots-dev.cern.ch/production.php?view=control Follow the link in col "coverage" of the current revision (currently 2390) http://mcplots-dev.cern.ch/production.php?view=revision&rev=2390 Takes a while, be patient (... more patient). The page you get includes a runtime histogram. Theory native logs can be checked, e.g for a task running in slot 0 .../slots/0/cernvm/shared/runRivet.log Runtimes can be between a few minutes and a couple of days. Long runtimes don't necessarily indicate an error. ID: 48151 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2244 Credit: 173,988,818 RAC: 7,494	Message 48153 - Posted: 30 May 2023, 10:56:41 UTC - in response to Message 48150. Can I see, if the tasks are alive and doing fine or should I abort them if longer than XX:00 Hours ? Have one now 6 day running (41000 from 49.000 events finished - max. is 10 days for Theory) ID: 48153 · Reply Quote

Henry Nebrensky Send message Joined: 13 Jul 05 Posts: 169 Credit: 15,000,737 RAC: 0	Message 48154 - Posted: 30 May 2023, 12:38:05 UTC - in response to Message 48151. Theory native logs can be checked, e.g for a task running in slot 0 .../slots/0/cernvm/shared/runRivet.log If you "head" the runRivet.log, it will tell you the code in use and how many events that specific task is to generate: [boinc pp jets 8000 170,-,2960 - pythia8 8.301 dire-default 57000 482] 57k in this case. If you then "tail" the log you can see how far it's got and if it's making progress... ID: 48154 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 48576 - Posted: 19 Sep 2023, 11:45:07 UTC - in response to Message 48154. Last modified: 19 Sep 2023, 11:51:57 UTC Theory native logs can be checked, e.g for a task running in slot 0 .../slots/0/cernvm/shared/runRivet.log If you "head" the runRivet.log, it will tell you the code in use and how many events that specific task is to generate: [boinc pp jets 8000 170,-,2960 - pythia8 8.301 dire-default 57000 482] 57k in this case. If you then "tail" the log you can see how far it's got and if it's making progress... Thanks for that explanation. That's a lot of work to expect from a BOINC user to decide if the WU will ever finish. The problem is that I have many where the progress is being reported as over 98% and looking at the end the WU's runRivet.log it says it's completed 63000 out of 100000 events. That should display to us 63% progress and not 98.563%. If progress was reported accurately then folks would let the task s run. But when they see it seem to stall at over 98% for many hours they assume something is wrong and abort the WU. Hopefully CERN will fix this progress reporting bug soon. Expect many aborted tasks in the meantime. The other problem is that these Theory tasks don't checkpoint. I for one am on Time-of-Use electric service and my electric rate increases 10x during peak hours. If I can't suspend and resume from a checkpoint the task will get aborted when I do a daily TOU shutdown. ID: 48576 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 48577 - Posted: 19 Sep 2023, 12:00:43 UTC - in response to Message 48576. This is not a bug, hence CERN will never "fix" this. What you compare is BOINC's progress estimation with the logfile entries of a family of scientific apps. Most of them but not all print the #of processed events to the logfile. Since the majority of Theory tasks finish within a couple of hours or even faster the best you can do is to be patient. ID: 48577 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 48579 - Posted: 19 Sep 2023, 12:15:49 UTC - in response to Message 48577. It is a bug, a thoughtless inconsiderate bug that could fixed. Patience would be idiotic and wasteful. You clearly did not understand my comments about wasting expensive electricity. ID: 48579 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 48580 - Posted: 19 Sep 2023, 12:31:50 UTC - in response to Message 48579. As said: CERN will not solve this. If you still think it is a bug, then clearly describe it and open an issue at github. Beside that it would be easy to run a oneliner like this: find /your/boinc/working/dir/slots -type f -name "runRivet.log" -mmin +180 \|xargs -I {} ls -hal {} This prints all candidates where Theory did not update runRivet.log within the last 180 min (=> might hang). Now inspect just those candidates. A few lines more and it tests the whole server farm from your desktop. ID: 48580 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49861 - Posted: 2 Apr 2024, 6:54:07 UTC Last modified: 2 Apr 2024, 6:54:40 UTC I have one Theory unit that's been running for three and a half days. I just left it to do it's thing. Today I got curious and this is in the runRivet.log HERWIGPP=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt Run herwig++ 2.5.1 ... generatorExecString = /cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt/bin/Herwig++ read -r /cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt/share/Herwig++/HerwigDefaults.rpo /shared/tmp/tmp.IPuslKhFRO/generator.params >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> ThePEG - Toolkit for HEP Event Generation - version 1.7.1 <<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. No more warnings of this kind will be reported. There is nothing after this. The work unit name is Theory_2743-2857700-30 This looks like a dead unit to me, but I'm hardly an expert. I've been very careful to not pause or restart the unit in any way, and I haven't been fiddling about with system installed packages or filesystems lately, so I don't know what might cause this. Other units are completing successful while this one just sits there. Should I just let it run or kill it? ID: 49861 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2244 Credit: 173,988,818 RAC: 7,494	Message 49863 - Posted: 2 Apr 2024, 7:28:10 UTC - in response to Message 49861. It's like fog. You can cancel it or waiting for the hard stop after 10 days ;-) ID: 49863 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49864 - Posted: 2 Apr 2024, 7:32:42 UTC - in response to Message 49863. I don't follow. Other Theory tasks I have running are processing events normally and showing them in their respective logs but this one is different. Is this indicative of a failure and I should abort the task or is this just another normal variation I haven't happened to see before? ID: 49864 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2244 Credit: 173,988,818 RAC: 7,494	Message 49867 - Posted: 2 Apr 2024, 8:04:54 UTC - in response to Message 49864. Last modified: 2 Apr 2024, 8:06:14 UTC http://mcplots-dev.cern.ch/production.php?view=revision&rev=2743 Theory have hundreds of working tasks with difficult working parameter. mcplots-dev must be started new from default homepage, because of revision. ID: 49867 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 49868 - Posted: 2 Apr 2024, 8:47:36 UTC - in response to Message 49861. This works for most (but not all!) Theory native tasks: 1. Get the "last modified time" from runRivet.log. 2. Check the 1st line of runRivet.log. There you find the starting time and the number of events to be processed (marked bold). ===> [runRivet] Sun Mar 31 17:37:53 UTC 2024 [boinc pp jets 8000 100 - pythia8 8.212 tune-AU2ct10 100000 34] 3. Locate the last line that looks somehow like this 74100 events processed 4. Calculate the estimated remaining time from those values. Ignore the BOINC progress estimation. It can't look into the logs. If there are no "processed" lines at all or no new lines for many hours, then the task most likely got stuck. => abort it Pitfalls: - most but not all tasks run 100000 events - certain tasks run through a very long setup phase to configure the environment. => you will see no "processed" lines for many hours, but then they appear rapidly - in rare cases you get a scientific app that does not even print a single "processed" line. Those logs look different to most regular ones. ID: 49868 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49869 - Posted: 2 Apr 2024, 8:51:06 UTC - in response to Message 49868. This is the complete log file: ===> [runRivet] Fri Mar 29 15:11:45 UTC 2024 [boinc pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 100000 30] Setting environment... INFO: uname: Linux runc 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 x86_64 x86_64 GNU/Linux INFO: /etc/redhat-release: cat: /etc/redhat-release: No such file or directory MCGENERATORS=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators g++ = /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-8a51a/x86_64-centos7/bin/g++ g++ version = 11.2.0 RIVET=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt YODA=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/yoda/1.9.10/x86_64-centos7-gcc11-opt Rivet version = rivet v3.1.10 RIVET_ANALYSIS_PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt/lib/Rivet:/shared/analyses RIVET_DATA_PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt/share/Rivet:/shared/analyses GSL=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/GSL/2.7/x86_64-centos7-gcc11-opt HEPMC=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/HepMC/2.06.11/x86_64-centos7-gcc11-opt FASTJET=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/fastjet/3.4.1/x86_64-centos7-gcc11-opt PYTHON=/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/Python/3.9.12/x86_64-centos7-gcc11-opt Input parameters: mode=boinc beam=pp process=z1j energy=8000 params=- specific=- generator=herwig++ version=2.5.1 tune=LHC-UE-EE-2-2760 nevts=100000 seed=30 Prepare temporary directories and files ... workd=/shared tmpd=/shared/tmp/tmp.IPuslKhFRO tmp_params=/shared/tmp/tmp.IPuslKhFRO/generator.params tmp_hepmc=/shared/tmp/tmp.IPuslKhFRO/generator.hepmc tmp_yoda=/shared/tmp/tmp.IPuslKhFRO/generator.yoda tmp_jobs=/shared/tmp/tmp.IPuslKhFRO/jobs.log tmpd_flat=/shared/tmp/tmp.IPuslKhFRO/flat tmpd_dump=/shared/tmp/tmp.IPuslKhFRO/dump tmpd_rivetdb=/shared/tmp/tmp.IPuslKhFRO/rivetdb.map Prepare Rivet parameters ... Total histograms selected: 1 analysesNames=ATLAS_2019_I1744201 Total analyses selected: 1 analysesBaseNames=ATLAS_2019_I1744201 Total base analyses selected: 1 Unpack data histograms... dataFiles = /cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt/share/Rivet/ATLAS_2019_I1744201.yoda.gz output = /shared/tmp/tmp.IPuslKhFRO/flat make: Entering directory `/shared/rivetvm' g++ yoda2flat-split.cc -o yoda2flat-split.exe -Wfatal-errors -Wl,-rpath /cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/yoda/1.9.10/x86_64-centos7-gcc11-opt/lib `/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/yoda/1.9.10/x86_64-centos7-gcc11-opt/bin/yoda-config --cppflags --libs` make: Leaving directory `/shared/rivetvm' Total histograms unpacked=20 / selected=1 complete ./REF_ATLAS_2019_I1744201_d02-x01-y01.dat Building rivetvm ... make: Entering directory `/shared/rivetvm' g++ rivetvm.cc -o rivetvm.exe -DNDEBUG -Wfatal-errors -Wl,-rpath /cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt/lib -Wl,-rpath /cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/HepMC/2.06.11/x86_64-centos7-gcc11-opt/lib `/cvmfs/sft.cern.ch/lcg/releases/LCG_104d_ATLAS_10/MCGenerators/rivet/3.1.10/x86_64-centos7-gcc11-opt/bin/rivet-config --cppflags --ldflags --libs` -lHepMC make: Leaving directory `/shared/rivetvm' Run herwig++ 2.5.1 and Rivet ... generatorExecString = ./rungen.sh boinc pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 100000 30 /shared/tmp/tmp.IPuslKhFRO/generator.hepmc rivetExecString = /shared/rivetvm/rivetvm.exe -a ATLAS_2019_I1744201 -i /shared/tmp/tmp.IPuslKhFRO/generator.hepmc -o /shared/tmp/tmp.IPuslKhFRO/flat -H /shared/tmp/tmp.IPuslKhFRO/generator.yoda -d /shared/tmp/tmp.IPuslKhFRO/dump INFO: (display) T4T_DISPLAY= INFO: (display) datdir=/shared/tmp/tmp.IPuslKhFRO/dump INFO: (display) vars=pp z1j 8000 - herwig++ 2.5.1 LHC-UE-EE-2-2760 INFO: display service switched off ===> [rungen] Fri Mar 29 15:11:56 UTC 2024 [boinc pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 100000 30 /shared/tmp/tmp.IPuslKhFRO/generator.hepmc] Setting environment for herwig++ 2.5.1 ... tree = hepmc2.06.05 tag = grep: /etc/redhat-release: No such file or directory MCGENERATORS=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05 LCG_PLATFORM=x86_64-slc5-gcc43-opt g++ = /shared/tmp/tmp.eldt4Q5K7G/g++ g++ version = 4.3.6 g++ orig = /cvmfs/sft.cern.ch/lcg/external/gcc/4.3.6/x86_64-slc5/bin/g++ AGILE=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/agile/1.4.0/x86_64-slc5-gcc43-opt HEPMC=/cvmfs/sft.cern.ch/lcg/external/HepMC/2.06.05/x86_64-slc5-gcc43-opt AGILE_GEN_PATH=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05 LHAPDF=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/lhapdf/5.8.9/x86_64-slc5-gcc43-opt grep: /etc/redhat-release: No such file or directory INFO: EL9/CC7 compat: herwig++ - added work-around for missing libraries: -rwxr-xr-x 1 0 0 7504 Mar 29 15:11 empty.so lrwxrwxrwx 1 0 0 8 Mar 29 15:11 libreadline.so.5 -> empty.so lrwxrwxrwx 1 0 0 8 Mar 29 15:11 libtermcap.so.2 -> empty.so /shared Input parameters: mode=boinc beam=pp process=z1j energy=8000 params=- specific=- generator=herwig++ version=2.5.1 tune=LHC-UE-EE-2-2760 nevts=100000 seed=30 outfile=/shared/tmp/tmp.IPuslKhFRO/generator.hepmc Prepare temporary directories and files ... workd=/shared tmpd=/shared/tmp/tmp.IPuslKhFRO tmp_params=/shared/tmp/tmp.IPuslKhFRO/generator.params Decoding parameters of generator... pTmin = 0 pTmax = 8000 mHatMin = 0 mHatMax = 8000 processCode=z1j beam1=p+ beam2=p+ beam energy = 4000. INFO: steering file template = configuration/herwig++-z1j.params INFO: cache is not active, CACHE= Prepare herwig++ 2.5.1 parameters ... => /shared/tmp/tmp.IPuslKhFRO/generator.params : # based on example from Herwig++ 2.4.2 distribution: # share/Herwig++/TVT.in # Run options: cd /Herwig/Generators set LHCGenerator:NumberOfEvents 100000 set LHCGenerator:RandomNumberGenerator:Seed 30 set LHCGenerator:DebugLevel 0 set LHCGenerator:PrintEvent 1 set LHCGenerator:MaxErrors 100000 # redirect all log output to stdout set LHCGenerator:UseStdout true # do output to a HepMC file cd /Herwig/Generators insert LHCGenerator:AnalysisHandlers 0 /Herwig/Analysis/HepMCFile set /Herwig/Analysis/HepMCFile:PrintEvent 1000000 set /Herwig/Analysis/HepMCFile:Format GenEvent set /Herwig/Analysis/HepMCFile:Filename /shared/tmp/tmp.IPuslKhFRO/generator.hepmc # set /Herwig/Analysis/HepMCFile:Units GeV_mm # Beam parameters: set LHCGenerator:EventHandler:LuminosityFunction:Energy 8000 set LHCGenerator:EventHandler:BeamA /Herwig/Particles/p+ set LHCGenerator:EventHandler:BeamB /Herwig/Particles/p+ set LHCGenerator:MaxErrors -1 # Process setup # Z+1jet production cd /Herwig/MatrixElements insert SimpleQCD:MatrixElements[0] MEZJet DISABLEREADONLY newdef MEZJet:ZDecay ChargedLeptons ## Set cuts ## Use this for hard leading-jets in a certain pT window set /Herwig/Cuts/JetKtCut:MinKT 0GeV # minimum jet pT set /Herwig/Cuts/JetKtCut:MaxKT 8000GeV # maximum jet pT # ## Use this for a certain mHat window #set /Herwig/Cuts/QCDCuts:MHatMin 0GeV # minimum jet mHat #set /Herwig/Cuts/QCDCuts:MHatMax 8000GeV # maximum jet mHat # Make particles with ctau > 10 mm stable: set /Herwig/Decays/DecayHandler:MaxLifeTime 10mm set /Herwig/Decays/DecayHandler:LifeTimeOption Average # tune 'LHC-UE-EE-2-2760' parameters: ------------------- #%tuneFile% # Based on LHC tune example from Herwig++ 2.5.1 distribution # share/Herwig++/LHC-UE-EE-2.in ################################################## # Override default MPI parameters ################################################## # Colour reconnection settings set /Herwig/Hadronization/ColourReconnector:ColourReconnection Yes set /Herwig/Hadronization/ColourReconnector:ReconnectionProbability 0.55 # Colour Disrupt settings set /Herwig/Partons/RemnantDecayer:colourDisrupt 0.15 # inverse hadron radius set /Herwig/UnderlyingEvent/MPIHandler:InvRadius 1.1 ## for \sqrt(s) = 2760 GeV # Min KT parameter set /Herwig/UnderlyingEvent/KtCut:MinKT 3.31 # This should always be 2MinKT!! set /Herwig/UnderlyingEvent/UECuts:MHatMin 6.62 # MPI model settings set /Herwig/UnderlyingEvent/MPIHandler:softInt Yes set /Herwig/UnderlyingEvent/MPIHandler:twoComp Yes set /Herwig/UnderlyingEvent/MPIHandler:DLmode 3 # --------------------------------------------- set /Herwig/UnderlyingEvent/MPIHandler:IdenticalToUE -1 # Run generator cd /Herwig/Generators run TVT LHCGenerator -------------------------------------- HERWIGPP=/cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt Run herwig++ 2.5.1 ... generatorExecString = /cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt/bin/Herwig++ read -r /cvmfs/sft.cern.ch/lcg/external/MCGenerators_hepmc2.06.05/herwig++/2.5.1/x86_64-slc5-gcc43-opt/share/Herwig++/HerwigDefaults.rpo /shared/tmp/tmp.IPuslKhFRO/generator.params >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> ThePEG - Toolkit for HEP Event Generation - version 1.7.1 <<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< * An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. ** An event exception of type ThePEG::Exception occurred while generating event number 1: Failed to generate the shower after 100 attempts in Evolver::showerHardProcess() The event will be discarded. No more warnings of this kind will be reported. It appears to have never got the first event running for some reason. ID: 49869 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2244 Credit: 173,988,818 RAC: 7,494	Message 49871 - Posted: 2 Apr 2024, 9:20:01 UTC - in response to Message 49869. [boinc pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 100000 30] Have you searched in mcplots for this task, is it successful for other volunteers? ID: 49871 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49872 - Posted: 2 Apr 2024, 9:40:42 UTC - in response to Message 49871. Keyword: pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 (matched 1 of 202704 rows) run events attempts success failure unknown pp z1j 8000 - - herwig++ 2.5.1 LHC-UE-EE-2-2760 0 1 0 0 1 It appears nobody else has run this. ID: 49872 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49873 - Posted: 2 Apr 2024, 9:52:25 UTC Ok, poking around I checked the stderr.txt and found this 07:33:48 AEDT +11:00 2024-04-01: cranky-0.1.4: [INFO] Pausing container Theory_2743-2857700-30_0. apparently something DID cause it to pause at some point and I do not have resume capability (wrong sudo version, tried installing the latest version and it crashed every unit that ran from then on so I rolled it back). Since that makes it likely that I'm the one that broke it I've aborted the unit. ID: 49873 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 49874 - Posted: 2 Apr 2024, 9:53:00 UTC - in response to Message 49869. A simple estimation: It started Fri Mar 29 15:11:45 UTC 2024 and hasn't even processed a single event. At the time being it should have processed >35000 events to finish before the 10 day deadline. So, does it appear to hang? Yes, of course it hangs. => cancel it. ID: 49874 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,281,461 RAC: 57,759	Message 49875 - Posted: 2 Apr 2024, 10:01:51 UTC - in response to Message 49873. Right. Your system can't use the modern cgroups v2 method (sudo not recent enough) nor is it fully configured to use pause/resume with the old cgroups v1 based method. The log entry just shows that cranky got a pause signal from BOINC. ID: 49875 · Reply Quote

Dark Angel Send message Joined: 7 Aug 11 Posts: 105 Credit: 25,514,918 RAC: 23,961	Message 49876 - Posted: 2 Apr 2024, 10:03:24 UTC - in response to Message 49874. As I said, it's been aborted now. I had only left it alone because I've had others run long but were otherwise working normally. I don't make a habit of digging through workunit logs without cause. ID: 49876 · Reply Quote

LHC@home