Message boards : Theory Application : New Native Theory Version 1.1
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 52,457,949
RAC: 23,953
Message 39058 - Posted: 5 Jun 2019, 20:25:37 UTC

I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes.
Are you in some testing phase???
Unchain this thing and feed the beast.
ID: 39058 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39059 - Posted: 5 Jun 2019, 21:23:27 UTC - in response to Message 39058.  

It is probably because you are doing SixTrack. They have put a higher priority on that.
Just do ATLAS and Theory, and you will get more.
ID: 39059 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 52,457,949
RAC: 23,953
Message 39060 - Posted: 5 Jun 2019, 21:51:48 UTC - in response to Message 39059.  

I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day.
So far for me ATLAS runs best as 8C WUs. But NT is inefficient with more than 1C. Since CERN has chosen to make MAX#CPUs a global parameter instead of splitting them out separately for each project they work to opposite ends.
I've tried letting my WUs run down so I'm short and then checking only 1C NT but the most I get is 2 WUs in the morning and if I behave myself they give me 4 in the evening.
Since ST is only 1C I can check ST & ATLAS 8C at the same time and it works ok.
ID: 39060 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39061 - Posted: 5 Jun 2019, 22:56:18 UTC - in response to Message 39060.  

I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day.

OK. I wasn't sure which planet it came from.

Here is what I set for the "school" location. You can of course do different locations for the different projects and set them as you want for each location:

Resource share 100
Use CPU (checked)
Use ATI GPU (not checked)
Run test applications? (checked)
Run only the selected applications
SixTrack: no
sixtracktest: no
CMS Simulation: no
Theory Simulation: no
ATLAS Simulation: yes
Theory Native: yes
If no work for selected applications is available, accept work from other applications? no
Max # jobs No limit
Max # CPUs 8


Then, I use this app_config.xml in the program folder to set native Theory to one CPU core, and native ATLAS to two CPU cores.
It works for me, though I am only running on an i7-4790 with 8 cores, and don't know how it would do on more.

<app_config>

  <app>
    <name>ATLAS</name>
    <report_results_immediately/>
  </app>

  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--nthreads 2 --memory_size_mb 3600</cmdline> 
  </app_version>

  <app>
    <name>TheoryN</name>
    <report_results_immediately/>
  </app>

 <app_version>
   <app_name>TheoryN</app_name>
   <plan_class>native_theory</plan_class>
   <avg_ncpus>1</avg_ncpus>
 </app_version>

</app_config>
ID: 39061 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,500
RAC: 104,443
Message 39062 - Posted: 6 Jun 2019, 2:32:08 UTC - in response to Message 39058.  

I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes.
Are you in some testing phase???
Unchain this thing and feed the beast.

This thread is not for small-talk.
Read all Info's About it and you know why only two Tasks running at one time in Linux-native!
ID: 39062 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 52,457,949
RAC: 23,953
Message 39068 - Posted: 6 Jun 2019, 11:19:06 UTC - in response to Message 39062.  

Yes you're right maeax, NT is small potatoes. I've turned it off until it fixes its issues.
ID: 39068 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39069 - Posted: 6 Jun 2019, 12:20:25 UTC - in response to Message 39062.  

Read all Info's About it and you know why only two Tasks running at one time in Linux-native!

With the setup I have just posted, I typically run four native Theory (1 core per work unit) and two native ATLAS (2 cores per work unit), to fill up the 8 cores on my machine.

It is hazardous to predict the behavior of the LHC scheduler. You have to either try it yourself, or consult with the Romulans.
ID: 39069 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,500
RAC: 104,443
Message 39102 - Posted: 11 Jun 2019, 8:36:15 UTC
Last modified: 11 Jun 2019, 9:32:51 UTC

Have a Sherpa with more than 240 hours running saved and started once more to see why there is no end:
integration time: ( 2m 45s elapsed / 5s left )
3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) 310000 ( 359309 -> 86.5 % )
integration time: ( 2m 50s elapsed / 0s left )
2_4__e-__e+__jLight__jLight__jLight__jLight : 3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) exp. eff: 0.344254 %
reduce max for 2_4__e-__e+__jLight__jLight__jLight__jLight to 0.781627 ( eps = 0.001 )
Single_Process::CalculateTotalXSec(): Calculate xs for '2_2__e-__e+__b__bb' (Internal)
Starting the calculation. Lean back and enjoy ... .
3.2401 pb +- ( 0.0337927 pb = 1.04295 % ) 5000 ( 5000 -> 100 % )
full optimization: ( 0s elapsed / 1s left )
3.2394 pb +- ( 0.0239453 pb = 0.739188 % ) 10000 ( 10000 -> 100 % )
full optimization: ( 0s elapsed / 1s left )
3.23307 pb +- ( 0.0194224 pb = 0.60074 % ) 15000 ( 15000 -> 100 % )
full optimization: ( 0s elapsed / 1s left )
3.23459 pb +- ( 0.01685 pb = 0.520932 % ) 20000 ( 20000 -> 100 % )
full optimization: ( 0s elapsed / 1s left )

Is it possible to eliminate this process with > 100 %, because it find no end.
Saw this in COMAX.
In native is no time-limit. Only Volunteer can stop this task.

Edit:
12:19:53 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] Running Container 'runc'.
12:19:54 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] ===> [runRivet] Sat Jun 1 10:19:54 UTC 2019 [boinc ee zhad 197 - - sherpa 1.3.0 default 1000 64]
10:11:36 (20827): wrapper (7.15.26016): starting
10:11:36 (20827): wrapper (7.15.26016): starting
10:11:36 (20827): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 ()
10:11:36 CEST +02:00 2019-06-11: cranky-0.0.29: [INFO] Detected TheoryN App
ID: 39102 · Report as offensive     Reply Quote
Profile zepingouin
Avatar

Send message
Joined: 7 Jan 07
Posts: 41
Credit: 15,959,427
RAC: 271
Message 39103 - Posted: 11 Jun 2019, 17:39:23 UTC - in response to Message 39102.  

On a never ending TheoryN, I got this in the runRivet.log, "Y out of bounds !" repeatedly :

+----------------------------------+
|                                  |
|      CCC  OOO  M   M I X   X     |
|     C    O   O MM MM I  X X      |
|     C    O   O M M M I   X       |
|     C    O   O M   M I  X X      |
|      CCC  OOO  M   M I X   X     |
|                                  |
+==================================+
|  Color dressed  Matrix Elements  |
|     http://comix.freacafe.de     |
|   please cite  JHEP12(2008)039   |
+----------------------------------+
Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 229284 kB, 11s ).
Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 229680 kB, 0s ).
Initialized the Matrix_Element_Handler for the hard processes.
Initialized the Soft_Photon_Handler.
Hadron_Decay_Map::Read:   Initializing HadronDecays.dat. This may take some time.
Initialized the Hadron_Decay_Handler, Decay model = Hadrons
Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal)
Starting the calculation. Lean back and enjoy ... .
Updating display...
Display update finished (0 histograms, 0 events).
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049
Channel_Elements::GenerateYBackward(2.1181500151722e-10,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-5.10672,^H}):  Y out of bounds !
   ymin, ymax vs. y : -10 10 vs. 10
Setting y to upper bound ymax=10
Updating display...
Display update finished (0 histograms, 0 events).
Channel_Elements::GenerateYForward(0.65305151839935,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,7.31876,^H}):  Y out of bounds !
   ymin, ymax vs. y : -0.21305 0.21305 vs. -0.21305
Setting y to lower bound  ymin=-0.21305
Channel_Elements::GenerateYForward(0.0374227,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-0.581824,^H}):  Y out of bounds !
etc ...
ID: 39103 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 39131 - Posted: 15 Jun 2019, 11:40:49 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=230970330
3 days after the server marked the above task, a Sherpa, "abandoned", it was still running on my host.
It's not the first one, I've had several others in the past month.
Congrats LHC devs, you've taken Sherpa long runners and made them into forever runners.

So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time.
ID: 39131 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,500
RAC: 104,443
Message 39132 - Posted: 15 Jun 2019, 12:03:41 UTC

This Sherpa
20:37:26 CEST +02:00 2019-06-10: cranky-0.0.29: [INFO] ===> [runRivet] Mon Jun 10 18:37:26 UTC 2019 [boinc pp jets 8000 150,-,2360 - sherpa 2.2.5 default 4000 67]

finished after 2 days and 22 Hours - not all are bad:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=231900388
ID: 39132 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 39133 - Posted: 15 Jun 2019, 14:30:55 UTC - in response to Message 39132.  


finished after 2 days and 22 Hours - not all are bad:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=231900388

True, not all are bad. The problem, at least for me, is that the forever runners pile up in a big log jam and block the long runners that have a chance at succeeding. That can't be good for the science.

I guess if I were a "serious" cruncher I would have time to manually pick the log jam apart (with assistance from MC Production) but cherry picking is boring as well as tedious.

If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day.

Depending on how lucky they feel, watchdog users will be able to opt to accept Sherpa but set max task duration to as many days as they want or allow them to run forever. Maybe having such options available will get the number of TheoryN users up into the 50's. ~35 is abysmal.
ID: 39133 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 39134 - Posted: 15 Jun 2019, 17:00:52 UTC - in response to Message 39133.  

If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day.

I think they should do that. We shouldn't be putting them out of a job.
ID: 39134 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 39137 - Posted: 17 Jun 2019, 8:00:28 UTC - in response to Message 39131.  

So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time.

I will look into this. Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread.
ID: 39137 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 39138 - Posted: 17 Jun 2019, 15:25:10 UTC - in response to Message 39137.  

So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time.

I will look into this.
Don't bother. The solution is nigh.

Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread.
What for? Crystal Pellet and a few others have been posting links and log snippets for 10 years and all that's got us is devs have turned long runners into forever runners. That's not progress. That's regress.
ID: 39138 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,901,592
RAC: 138,070
Message 39146 - Posted: 18 Jun 2019, 7:27:58 UTC

ID: 39146 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,500
RAC: 104,443
Message 39478 - Posted: 1 Aug 2019, 14:43:19 UTC
Last modified: 1 Aug 2019, 14:45:11 UTC

<message>
Disk usage limit exceeded
</message>
13:55:58 CEST +02:00 2019-08-01: cranky-0.0.29: [INFO] ===> [runRivet] Thu Aug 1 11:55:58 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.6.1.atlas default 100000 84]
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=119890616
ID: 39478 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 116
Credit: 10,925,485
RAC: 2,376
Message 39485 - Posted: 2 Aug 2019, 19:53:11 UTC

I would like to run LinuxN tasks on Ubuntu 14.04.6 (kernel 3.13)

No errors setting things up, but
tasks fail like this:-

19:13:41 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 ()
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Detected TheoryN App
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking CVMFS.
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking runc.
/cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version
19:13:43 BST +01:00 2019-08-02: cranky-0.0.29: [ERROR] 'runc -v' failed.
19:13:44 (3159): cranky exited; CPU time 0.008713
19:13:44 (3159): app exit status: 0xce
19:13:44 (3159): called boinc_finish(195)

The installed libseccomp version is:-

libseccomp2 2.1.1-1ubuntu1-trusty5

I have tried temporarily installing the "seccomp" package with no effect.

Temporarily installing the "lxc" and "lxcfs" packages also had no effect.

Any ideas welcome
ID: 39485 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,901,592
RAC: 138,070
Message 39487 - Posted: 2 Aug 2019, 21:36:21 UTC - in response to Message 39485.  

IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems.


Step 1

You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux.


Step 2

Save the lib in an arbitrary folder and within this folder create a symlink to the lib.

Example:
/usr/local/lib64/runc_for_lhc/
lrwxrwxrwx 1 root root 19 4. Dez 2017 libseccomp.so.2 -> libseccomp.so.2.3.1
-rwxr-xr-x 1 root root 259K 10. Mai 2017 libseccomp.so.2.3.1


Step 3

create a config file /etc/ld.so.conf.d/runc.conf where you configure the path to your new libs:
/usr/local/lib64/runc_for_lhc

Activate this config running "sudo ldconfig".


Step 4

Check which libs will be used

ldd /cvmfs/grid.cern.ch/vc/containers/runc
linux-vdso.so.1 (0x00007ffdeeb76000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6408915000)
libseccomp.so.2 => /usr/local/lib64/runc_for_lhc/libseccomp.so.2 (0x00007f64086d4000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f64084d0000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6408122000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6408b33000)
ID: 39487 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 116
Credit: 10,925,485
RAC: 2,376
Message 39490 - Posted: 3 Aug 2019, 11:36:10 UTC - in response to Message 39487.  

IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems.

OK that's helpful, thanks, and for the instructions, too. In the meantime I had found and installed v2.2.3-2 (from trusty backports) which didn't help.

You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux

I've got the .deb archive for 2.4.1 (which appears to be available as a security update for 16.04) but can't get a usable library file out of it. It's 41bytes and shows up as a "broken link" I've clearly done something wrong somewhere.
It looks like moving to OS version 16.04 might be needed but that means much downloading.... don't want that at the moment., so the hunt goes on.
ID: 39490 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Theory Application : New Native Theory Version 1.1


©2024 CERN