61) Message boards : Theory Application : Issues Native Theory application (Message 38447)
Posted 26 Mar 2019 by bronco
Post:
[quoteI avoid sixtrack. It is too easy, and requires no special software. Anyone can run it, so I let them.{/quote]
Good point. It illustrates the lost opportunity cost concept very well... when you crunch sixtrack you lose the opportunity to crunch a task that many other volunteers cannot crunch.
62) Message boards : CMS Application : New Version v49.00 (Message 38444)
Posted 26 Mar 2019 by bronco
Post:
This new version updates the CVMFS cache to reduce the downloads each time the VM starts.

Nice! Native CMS by August?
63) Message boards : Theory Application : Issues Native Theory application (Message 38443)
Posted 26 Mar 2019 by bronco
Post:
Hoping to add native CMS to the mix someday.

Exactly, but I have to keep my health up to live that long.

Optimism promotes longevity and it tastes better than liver, kale and fat free ice cream.
If you haven't already done so, snag some sixtrack and see how nicely they play with native ATLAS and native Theory. With "switch between tasks every..." set to 2080 minutes and a sane task cache nothing gets preempted which avoids ATLAS restarting from 0 events. Very nice!!
64) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 38442)
Posted 26 Mar 2019 by bronco
Post:
Suddenly it was over very quickly: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219941269
The whole runRivet.log available, but displaying here the last part after integration.
2.58077e-13 pb +- ( 5.15428e-15 pb = 1.99719 % ) 7100000 ( 112624285 -> 7.1 % )
integration time:  ( 2d 3h 13m 47s elapsed / 0s left ) [14:27:14]   
2_4__j__j__j__j__j__j__NQ_0-4 : 2.58077e-13 pb +- ( 5.15428e-15 pb = 1.99719 % )  exp. eff: 3.24641e-05 %
  reduce max for 2_4__j__j__j__j__j__j__NQ_0-4 to 0.218194 ( eps = 0.001 ) 
Output_Phase::Output_Phase(): Set output interval 1000000000 events.
----------------------------------------------------------
-- SHERPA generates events with the following structure --
----------------------------------------------------------
Perturbative       : Signal_Processes
Perturbative       : Hard_Decays
Perturbative       : Jet_Evolution:CSS
Perturbative       : Lepton_FS_QED_Corrections:Photons
Perturbative       : Multiple_Interactions:Amisic
Perturbative       : Minimum_Bias:Off
Hadronization      : Beam_Remnants
Hadronization      : Hadronization:Ahadic
Hadronization      : Hadron_Decays
Analysis           : HepMC2
---------------------------------------------------------
  Event 1 ( 0s elapsed / 1m 59s left ) -> ETA: Tue Mar 26 14:29  
  XS = 13.1493 pb +- ( 13.1493 pb = 100 % )  
#--------------------------------------------------------------------------
#                         FastJet release 3.0.3
#                 M. Cacciari, G.P. Salam and G. Soyez                  
#     A software package for jet finding and analysis at colliders      
#                           http://fastjet.fr                           
#                                                                       
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].   
#								      	   
# FastJet is provided without warranty under the terms of the GNU GPLv2.
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
  Event 2 ( 0s elapsed / 2m 39s left ) -> ETA: Tue Mar 26 14:29  
  XS = 13.5889 pb +- ( 9.59415 pb = 70.6 % )  
  Event 3 ( 0s elapsed / 3m 6s left ) -> ETA: Tue Mar 26 14:30  
  XS = 14.4529 pb +- ( 8.32632 pb = 57.61 % )  
  Event 4 ( 0s elapsed / 3m 14s left ) -> ETA: Tue Mar 26 14:30  
  XS = 15.2066 pb +- ( 7.58387 pb = 49.87 % )  
  Event 5 ( 0s elapsed / 3m 7s left ) -> ETA: Tue Mar 26 14:30  
  XS = 17.012 pb +- ( 7.58477 pb = 44.58 % )  
  Event 6 ( 0s elapsed / 3m 12s left ) -> ETA: Tue Mar 26 14:30  
  XS = 16.4568 pb +- ( 6.69778 pb = 40.69 % )  
  Event 7 ( 0s elapsed / 3m 5s left ) -> ETA: Tue Mar 26 14:30  
  XS = 15.9021 pb +- ( 5.99205 pb = 37.68 % )  
  Event 8 ( 0s elapsed / 2m 56s left ) -> ETA: Tue Mar 26 14:30  
  XS = 14.4568 pb +- ( 5.09674 pb = 35.25 % )  
  Event 9 ( 0s elapsed / 2m 43s left ) -> ETA: Tue Mar 26 14:29  
  XS = 15.5956 pb +- ( 5.18239 pb = 33.22 % )  
  Event 10 ( 0s elapsed / 2m 39s left ) -> ETA: Tue Mar 26 14:29  
  XS = 15.7089 pb +- ( 4.95184 pb = 31.52 % )  
  Event 20 ( 1s elapsed / 2m 57s left ) -> ETA: Tue Mar 26 14:30  
  XS = 16.6757 pb +- ( 3.71556 pb = 22.28 % )  
  Event 30 ( 2s elapsed / 2m 48s left ) -> ETA: Tue Mar 26 14:30  
  XS = 13.714 pb +- ( 2.49638 pb = 18.2 % )  
  Event 40 ( 3s elapsed / 2m 47s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.8102 pb +- ( 2.0198 pb = 15.76 % )  
  Event 50 ( 4s elapsed / 2m 47s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.8264 pb +- ( 1.80881 pb = 14.1 % )  
  Event 60 ( 5s elapsed / 2m 50s left ) -> ETA: Tue Mar 26 14:30  
  XS = 13.1017 pb +- ( 1.68655 pb = 12.87 % )  
  Event 70 ( 6s elapsed / 2m 50s left ) -> ETA: Tue Mar 26 14:30  
  XS = 13.4696 pb +- ( 1.60514 pb = 11.91 % )  
  Event 80 ( 7s elapsed / 2m 53s left ) -> ETA: Tue Mar 26 14:30  
  XS = 13.0581 pb +- ( 1.45572 pb = 11.14 % )  
  Event 90 ( 8s elapsed / 2m 52s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.6936 pb +- ( 1.36468 pb = 10.75 % )  
  Event 100 ( 9s elapsed / 2m 52s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.6198 pb +- ( 1.2842 pb = 10.17 % )  
100 events processed
dumping histograms...
Updating display...
  Event 200 ( 18s elapsed / 2m 46s left ) -> ETA: Tue Mar 26 14:30  
  XS = 11.4647 pb +- ( 0.842384 pb = 7.34 % )  
200 events processed
dumping histograms...
Display update finished (6 histograms, 100 events).
  Event 300 ( 27s elapsed / 2m 37s left ) -> ETA: Tue Mar 26 14:30  
  XS = 11.5149 pb +- ( 0.700558 pb = 6.08 % )  
300 events processed
dumping histograms...
  Event 400 ( 36s elapsed / 2m 27s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.0125 pb +- ( 0.624036 pb = 5.19 % )  
400 events processed
dumping histograms...
  Event 500 ( 46s elapsed / 2m 18s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.0868 pb +- ( 0.5616 pb = 4.64 % )  
500 events processed
dumping histograms...
  Event 600 ( 54s elapsed / 2m 7s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.4103 pb +- ( 0.524525 pb = 4.22 % )  
600 events processed
dumping histograms...
  Event 700 ( 1m 4s elapsed / 1m 59s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.3402 pb +- ( 0.480327 pb = 3.89 % )  
700 events processed
dumping histograms...
  Event 800 ( 1m 14s elapsed / 1m 51s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.2777 pb +- ( 0.44522 pb = 3.62 % )  
800 events processed
dumping histograms...
Updating display...
Display update finished (6 histograms, 800 events).
  Event 900 ( 1m 23s elapsed / 1m 42s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.3343 pb +- ( 0.423254 pb = 3.43 % )  
900 events processed
dumping histograms...
  Event 1000 ( 1m 33s elapsed / 1m 33s left ) -> ETA: Tue Mar 26 14:30  
  XS = 12.2405 pb +- ( 0.39885 pb = 3.25 % )  
1000 events processed
dumping histograms...
1100 events processed
1200 events processed
1300 events processed
1400 events processed
1500 events processed
Updating display...
Display update finished (6 histograms, 1000 events).
1600 events processed
1700 events processed
1800 events processed
1900 events processed
  Event 2000 ( 184 s total ) = 939386 evts/day                    
In Event_Handler::Finish : Summarizing the run may take some time.
+------------------------------------------------------+
|                                                      |
|  Total XS is 12.2115 pb +- ( 0.280779 pb = 2.29 % )  |
|                                                      |
+------------------------------------------------------+
Return_Value::PrintStatistics(): Statistics {
  Generated events: 2000
  New events {
    From "Jet_Evolution:CSS": 915 (8249) -> 11 %
  }
  Retried events {
    From "Beam_Remnants": 1 (2001) -> 0 %
    From "Jet_Evolution:CSS": 40 (8249) -> 0.4 %
  }
  Retried phases {
    From "Hadron_Decay_Handler::RejectExclusiveChannelsFromFragmentation": 415 (0) -> 415.
  }
  Retried methods {
    From "Decay_Channel::GenerateKinematics": 1 (485783) -> 0 %
  }
}
------------------------------------------------------------------------
Please cite the publications listed in 'Sherpa_References.tex'.
  Extract the bibtex list by running 'get_bibtex Sherpa_References.tex'
  or email the file to 'slaclib2@slac.stanford.edu', subject 'generate'.
------------------------------------------------------------------------
Time: 2d 5h 18m 58s on Tue Mar 26 14:30:22 2019
 (User: 2d 4h 47m 27s, System: 15m 10s, Children User: 0s, Children System: 0s)
Thanks for using LHAPDF 6.1.6. Please make sure to cite the paper:
  Eur.Phys.J. C75 (2015) 3, 132  (http://arxiv.org/abs/1412.7420)
2000 events processed
dumping histograms...
Rivet.Analysis.Handler: INFO  Finalising analyses
Rivet.Analysis.CMS_2017_I1519995: WARN  Skipping histo with null area /CMS_2017_I1519995/d03-x01-y01
Rivet.Analysis.CMS_2017_I1519995: WARN  Skipping histo with null area /CMS_2017_I1519995/d04-x01-y01
Rivet.Analysis.CMS_2017_I1519995: WARN  Skipping histo with null area /CMS_2017_I1519995/d06-x01-y01
Rivet.Analysis.Handler: INFO  Processed 2000 events

The MCnet usage guidelines apply to Rivet: see http://www.montecarlonet.org/GUIDELINES
Please acknowledge plots made with Rivet analyses, and cite arXiv:1003.0694 (http://arxiv.org/abs/1003.0694)

Generator run finished successfully

Processing histograms...
input  = /shared/tmp/tmp.0FlqnMaf1B/flat
output = /shared
./runRivet.sh: line 742:   202 Killed                  display_service $tmpd_dump "$beam $process $energy $params $generator $version $tune"  (wd: /shared)
mc:  CMS_2017_I1519995_d02-x01-y01.dat -> /shared/dat/pp/jets/dijet_chi/cms2017-m4200/13000/sherpa/2.2.2/default.dat
data:  REF_CMS_2017_I1519995_d02-x01-y01.dat -> /shared/dat/pp/jets/dijet_chi/cms2017-m4200/13000/CMS_2017_I1519995.dat

Disk usage: 33836 Kb

CPU usage: 193426 s

Clean tmp ...

Run finished successfully


Wonderful! Your point is?
65) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 38440)
Posted 26 Mar 2019 by bronco
Post:
Still running:
===> [runRivet] Sun Mar 24 08:22:01 UTC 2019 [boinc pp jets 13000 250,-,4160 - sherpa 2.2.2 default 2000 34]

and time left down again to 0s.
2.56608e-13 pb +- ( 5.25828e-15 pb = 2.04915 % ) 6820000 ( 108708125 -> 7.1 % )
integration time:  ( 2d 47m 7s elapsed / 0s left ) [11:57:35]
What's next?

According to the Sherpa 2.1.0 manual, "Sherpa will then move on to integrate the other processes specified in the run card."
And "When the integration is complete, the event generation will start."
66) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 38437)
Posted 26 Mar 2019 by bronco
Post:
All you can do is abort the task.

This is the simple answer and not good for the Science.

Really? If the task fails (and his task most certainly will), it won't even upload a result. No result = no science. If he had aborted the task 24 hours ago he could have received a pythia, herwig, <whatever> that is far more likely to succeed and do some worthwhile science.
It has been stated that they learn something even if the job fails. Really? What do they learn? All they learn is that the job failed. They can learn that fact from the 2 failures from the 2 wingmen.
It's the principle of lost opportunity cost... every failed job is a lost opportunity to do some useful science.

Have a Sherpa running for 10K Minutes now And show every minute a answer in runRivet.log.
You got lucky.

In Theory-Thread is a link for the Sherpa Documentation.
Please post that link, I couldn't find it but I would like to read it.
67) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 38434)
Posted 26 Mar 2019 by bronco
Post:
Display update finished (0 histograms, 0 events).

I don't know anything about sherpa jobs.. is it working or not?

runRivet.log is the right file and the task itself is working. The sherpa job is most certainly a fail. The giveaway is that it's stuck at 0 histograms and 0 events.
All you can do is abort the task.
68) Message boards : Theory Application : Installation of CVMFS (Message 38411)
Posted 24 Mar 2019 by bronco
Post:
The server scheduler is sometimes confused, how to interpret the max # settings.
...

Yeah I already suspected that. As soon as I changed it back enough tasks went in.
Thanks for the app_config but unfortunately when I want to save the file with gedit in the project directory Ubuntu doesn't let me access the 'projects'-folder for some reason...

You need root privilege to create/edit that file otherwise it's read only I usually do sudo nano ../app_config.xml. Nano is a minimal editor but it works well enough for short files.
69) Message boards : Theory Application : Issues Native Theory application (Message 38403)
Posted 24 Mar 2019 by bronco
Post:
I am now getting both Native Theory and Native ATLAS with the appropriated preferences selected.
Everything is working well.

Yes, very well. I have a native ATLAS and a native Theory running concurrently on a host with only 4 GB RAM :)
Hoping to add native CMS to the mix someday.
70) Message boards : Theory Application : Issues Native Theory application (Message 38386)
Posted 23 Mar 2019 by bronco
Post:
No result so far. Sherpa 1.4.3 for now 25 hours running and hoping to get a good end therefore.
runRivet.log is growing every minute (255Kbyte) nevts=1.000.
Let you know for the result.

From the database, knocking out because of the Deadline:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=109355962
But... still running 80 hours so far with 777,8 KByte runRivet.log :-))
@Bronco wrote Go Sherpa go!

Now I see I was wrong :-(
Die, sherpa, die!!
Setting my watchdog script to immediately abort any sherpa less than version 5.0.0.
71) Message boards : Theory Application : Issues Native Theory application (Message 38365)
Posted 21 Mar 2019 by bronco
Post:
Maybe you are the first one with a success ;)

@maeax
I think CP is sherpa shaming you ;)
Don't give in, keep the faith
Go, sherpa, go!!
72) Message boards : Theory Application : Issues Native Theory application (Message 38356)
Posted 21 Mar 2019 by bronco
Post:
No result so far. Sherpa 1.4.3 for now 25 hours running and hoping to get a good end therefore.
runRivet.log is growing every minute (255Kbyte) nevts=1.000.
Let you know for the result.

Death by graceful shutdown must end!
Sherpa lives matter too!!

2515 Minutes runtime, Bronco!
43 hours, so far.
424,9 kByte Output runRivet.log.
Deadline 21 March 2019 20:10:08 UTC.
Edit: This is no problem of native-Theory!

Yes, I understand. Sherpa has earned a bad reputation perhaps unfairly. Maybe it just needs more time than it is allowed in the VBox tasks. I will repeat your test next time I get a sherpa. Thank you for pointing the right direction :)
Go, sherpa, go!!
73) Message boards : Theory Application : Issues Native Theory application (Message 38351)
Posted 20 Mar 2019 by bronco
Post:
No result so far. Sherpa 1.4.3 for now 25 hours running and hoping to get a good end therefore.
runRivet.log is growing every minute (255Kbyte) nevts=1.000.
Let you know for the result.

Death by graceful shutdown must end!
Sherpa lives matter too!!
74) Message boards : Theory Application : Issues Native Theory application (Message 38339)
Posted 20 Mar 2019 by bronco
Post:
The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.
Don't trust the values reported in the results, specially when they are equal.
Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914
It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043,
but when you calculate job finish time minus the job start time (the job should have ran in one flow)
06:18:02 (32596): cranky exited; CPU time 3061.446043
05:38:19 (32596): wrapper (7.15.26016): starting

you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used.


I am looking at BOINCTasks history for the correct run/CPU times. And my statement is still true. At BEST 1.5 threads used but average is like 1.1 threads per task. I set it use 1 per task with app_config so the CPU would be fully used.

I had the same problem. I traced the cause to an error in my app_config.xml. Maybe you made the same error I did?
To avoid adding "noise" to this thread I posted a proper block for native Theory in a separate thread at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4975
75) Message boards : Theory Application : app_config.xml block for native Theory (Message 38338)
Posted 20 Mar 2019 by bronco
Post:
Should look similar to the following,
<app>
<name>TheoryN</name>
<max_concurrent>4</max_concurrent>
</app>
<app_version>
<app_name>TheoryN</app_name>
<plan_class>native_theory</plan_class>
<avg_ncpus>1</avg_ncpus>
<cmdline>--nthreads 1 </cmdline>
<!-- RAM formula: RAM = ? -->
<cmdline>--memory_size_mb 400 </cmdline>
</app_version>
76) Message boards : Theory Application : Issues Native Theory application (Message 38320)
Posted 19 Mar 2019 by bronco
Post:
Tried "sudo touch /var/lib/boinc/slots/0/cernvm/shared/shutdown" to gracefully kill a hung sherpa job. The command succeeded but the task didn't shutdown.
Is graceful shutdown not an option with native Theory or am I creating the file in the wrong folder?
77) Message boards : Theory Application : Issues Native Theory application (Message 38297)
Posted 19 Mar 2019 by bronco
Post:
... I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.
Promise: sherpa's will come. I've one running at the moment ;)

Damn! Guess I have to modify my watchdog script again.
78) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 38295)
Posted 19 Mar 2019 by bronco
Post:
Does it make sense to try all that on a box with 8Gig of RAM?

It feels as if it is generally quite fruitless to try with that amout of RAM, specifically if you try running other projects/stuff on the box as well.

I used to do it on 8GB. At first I got a lot of invalids. Then I boosted the "switch between tasks every..." setting to 24 hours and got a considerably higher success rate (but not 100%). I also learned to schedule OS updates and reboots around the ATLAS tasks so as not to suspend them. In theory ATLAS VBox tasks should not be bothered by suspending/resuming but in practice I found they are. YMMV.
79) Message boards : Theory Application : Issues Native Theory application (Message 38288)
Posted 19 Mar 2019 by bronco
Post:
The task is set to use 2 CPUs by default and barely over 1 is used and the reported time has run time = exactly CPU time. To the second on every task. At most I see 1.5 cores when the task is really short. 6min run time, 8 min CPU time.
Don't trust the values reported in the results, specially when they are equal.
Example your task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219459914
It reported 51m 1 sec, that is exactly the reported cpu time at the end of the result. 06:18:02 (32596): cranky exited; CPU time 3061.446043,
but when you calculate job finish time minus the job start time (the job should have ran in one flow)
06:18:02 (32596): cranky exited; CPU time 3061.446043
05:38:19 (32596): wrapper (7.15.26016): starting

you'll find the elapsed time is 2383 seconds, so 1 cpu is used far over 100% or 2 cpu's are partial used.

OK it makes sense now with respect to the numbers adding up correctly. I don't like the CPU's being used only partially but I'll ignore it if they promise there won't be any sherpa jobs.
80) Message boards : Theory Application : Issues Native Theory application (Message 38285)
Posted 19 Mar 2019 by bronco
Post:
The setup went without error, thanks Ivan for the great directions.
The directions are actually by Laurence ;)

2 X rivetvm.exe, 1 at 65% CPU, 1 at ~45% CPU
2 X pythia8.exe, 1 at ~80% CPU, 1 at ~65% CPU
Wahoo!! Very nice to see pythia running native but was hoping to see it using closer to 100% CPU?

To each job there are a lot of processes. Each job needs 1 rivetvm.exe and e.g. pythia8, agile-runmc (=pythia6), sherpa, herwig etc.
So you have to sum 1 jobname-process with a rivetvm and you see they are together >100%, what happens when you have idle cpu's.

But BOINC manager shows 2 X 2-CPU tasks = 4 CPU's in use, in other words no idle CPU's. Also, the task run times are nearly equal to the task CPU times when I would expect CPU time to be a little less than double the run time.


Previous 20 · Next 20


©2024 CERN