New Native Theory Version 1.1

Author	Message
Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 39058 - Posted: 5 Jun 2019, 20:25:37 UTC I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes. Are you in some testing phase??? Unchain this thing and feed the beast. ID: 39058 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39059 - Posted: 5 Jun 2019, 21:23:27 UTC - in response to Message 39058. It is probably because you are doing SixTrack. They have put a higher priority on that. Just do ATLAS and Theory, and you will get more. ID: 39059 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 39060 - Posted: 5 Jun 2019, 21:51:48 UTC - in response to Message 39059. I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day. So far for me ATLAS runs best as 8C WUs. But NT is inefficient with more than 1C. Since CERN has chosen to make MAX#CPUs a global parameter instead of splitting them out separately for each project they work to opposite ends. I've tried letting my WUs run down so I'm short and then checking only 1C NT but the most I get is 2 WUs in the morning and if I behave myself they give me 4 in the evening. Since ST is only 1C I can check ST & ATLAS 8C at the same time and it works ok. ID: 39060 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39061 - Posted: 5 Jun 2019, 22:56:18 UTC - in response to Message 39060. I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day. OK. I wasn't sure which planet it came from. Here is what I set for the "school" location. You can of course do different locations for the different projects and set them as you want for each location: Resource share 100 Use CPU (checked) Use ATI GPU (not checked) Run test applications? (checked) Run only the selected applications SixTrack: no sixtracktest: no CMS Simulation: no Theory Simulation: no ATLAS Simulation: yes Theory Native: yes If no work for selected applications is available, accept work from other applications? no Max # jobs No limit Max # CPUs 8 Then, I use this app_config.xml in the program folder to set native Theory to one CPU core, and native ATLAS to two CPU cores. It works for me, though I am only running on an i7-4790 with 8 cores, and don't know how it would do on more. <app_config> <app> <name>ATLAS</name> <report_results_immediately/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>native_mt</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 3600</cmdline> </app_version> <app> <name>TheoryN</name> <report_results_immediately/> </app> <app_version> <app_name>TheoryN</app_name> <plan_class>native_theory</plan_class> <avg_ncpus>1</avg_ncpus> </app_version> </app_config> ID: 39061 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027	Message 39062 - Posted: 6 Jun 2019, 2:32:08 UTC - in response to Message 39058. I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes. Are you in some testing phase??? Unchain this thing and feed the beast. This thread is not for small-talk. Read all Info's About it and you know why only two Tasks running at one time in Linux-native! ID: 39062 · Reply Quote

Aurum Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0	Message 39068 - Posted: 6 Jun 2019, 11:19:06 UTC - in response to Message 39062. Yes you're right maeax, NT is small potatoes. I've turned it off until it fixes its issues. ID: 39068 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39069 - Posted: 6 Jun 2019, 12:20:25 UTC - in response to Message 39062. Read all Info's About it and you know why only two Tasks running at one time in Linux-native! With the setup I have just posted, I typically run four native Theory (1 core per work unit) and two native ATLAS (2 cores per work unit), to fill up the 8 cores on my machine. It is hazardous to predict the behavior of the LHC scheduler. You have to either try it yourself, or consult with the Romulans. ID: 39069 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027	Message 39102 - Posted: 11 Jun 2019, 8:36:15 UTC Last modified: 11 Jun 2019, 9:32:51 UTC Have a Sherpa with more than 240 hours running saved and started once more to see why there is no end: integration time: ( 2m 45s elapsed / 5s left ) 3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) 310000 ( 359309 -> 86.5 % ) integration time: ( 2m 50s elapsed / 0s left ) 2_4__e-__e+__jLight__jLight__jLight__jLight : 3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) exp. eff: 0.344254 % reduce max for 2_4__e-__e+__jLight__jLight__jLight__jLight to 0.781627 ( eps = 0.001 ) Single_Process::CalculateTotalXSec(): Calculate xs for '2_2__e-__e+__b__bb' (Internal) Starting the calculation. Lean back and enjoy ... . 3.2401 pb +- ( 0.0337927 pb = 1.04295 % ) 5000 ( 5000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.2394 pb +- ( 0.0239453 pb = 0.739188 % ) 10000 ( 10000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.23307 pb +- ( 0.0194224 pb = 0.60074 % ) 15000 ( 15000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.23459 pb +- ( 0.01685 pb = 0.520932 % ) 20000 ( 20000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) Is it possible to eliminate this process with > 100 %, because it find no end. Saw this in COMAX. In native is no time-limit. Only Volunteer can stop this task. Edit: 12:19:53 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] Running Container 'runc'. 12:19:54 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] ===> [runRivet] Sat Jun 1 10:19:54 UTC 2019 [boinc ee zhad 197 - - sherpa 1.3.0 default 1000 64] 10:11:36 (20827): wrapper (7.15.26016): starting 10:11:36 (20827): wrapper (7.15.26016): starting 10:11:36 (20827): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 () 10:11:36 CEST +02:00 2019-06-11: cranky-0.0.29: [INFO] Detected TheoryN App ID: 39102 · Reply Quote

zepingouin Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,104,141 RAC: 61	Message 39103 - Posted: 11 Jun 2019, 17:39:23 UTC - in response to Message 39102. On a never ending TheoryN, I got this in the runRivet.log, "Y out of bounds !" repeatedly : +----------------------------------+ \| \| \| CCC OOO M M I X X \| \| C O O MM MM I X X \| \| C O O M M M I X \| \| C O O M M I X X \| \| CCC OOO M M I X X \| \| \| +==================================+ \| Color dressed Matrix Elements \| \| http://comix.freacafe.de \| \| please cite JHEP12(2008)039 \| +----------------------------------+ Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 229284 kB, 11s ). Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 229680 kB, 0s ). Initialized the Matrix_Element_Handler for the hard processes. Initialized the Soft_Photon_Handler. Hadron_Decay_Map::Read: Initializing HadronDecays.dat. This may take some time. Initialized the Hadron_Decay_Handler, Decay model = Hadrons Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal) Starting the calculation. Lean back and enjoy ... . Updating display... Display update finished (0 histograms, 0 events). ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049 Channel_Elements::GenerateYBackward(2.1181500151722e-10,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-5.10672,^H}): Y out of bounds ! ymin, ymax vs. y : -10 10 vs. 10 Setting y to upper bound ymax=10 Updating display... Display update finished (0 histograms, 0 events). Channel_Elements::GenerateYForward(0.65305151839935,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,7.31876,^H}): Y out of bounds ! ymin, ymax vs. y : -0.21305 0.21305 vs. -0.21305 Setting y to lower bound ymin=-0.21305 Channel_Elements::GenerateYForward(0.0374227,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-0.581824,^H}): Y out of bounds ! etc ... ID: 39103 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 39131 - Posted: 15 Jun 2019, 11:40:49 UTC https://lhcathome.cern.ch/lhcathome/result.php?resultid=230970330 3 days after the server marked the above task, a Sherpa, "abandoned", it was still running on my host. It's not the first one, I've had several others in the past month. Congrats LHC devs, you've taken Sherpa long runners and made them into forever runners. So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. ID: 39131 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027	Message 39132 - Posted: 15 Jun 2019, 12:03:41 UTC This Sherpa 20:37:26 CEST +02:00 2019-06-10: cranky-0.0.29: [INFO] ===> [runRivet] Mon Jun 10 18:37:26 UTC 2019 [boinc pp jets 8000 150,-,2360 - sherpa 2.2.5 default 4000 67] finished after 2 days and 22 Hours - not all are bad: https://lhcathome.cern.ch/lhcathome/result.php?resultid=231900388 ID: 39132 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 39133 - Posted: 15 Jun 2019, 14:30:55 UTC - in response to Message 39132. finished after 2 days and 22 Hours - not all are bad: https://lhcathome.cern.ch/lhcathome/result.php?resultid=231900388 True, not all are bad. The problem, at least for me, is that the forever runners pile up in a big log jam and block the long runners that have a chance at succeeding. That can't be good for the science. I guess if I were a "serious" cruncher I would have time to manually pick the log jam apart (with assistance from MC Production) but cherry picking is boring as well as tedious. If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day. Depending on how lucky they feel, watchdog users will be able to opt to accept Sherpa but set max task duration to as many days as they want or allow them to run forever. Maybe having such options available will get the number of TheoryN users up into the 50's. ~35 is abysmal. ID: 39133 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 39134 - Posted: 15 Jun 2019, 17:00:52 UTC - in response to Message 39133. If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day. I think they should do that. We shouldn't be putting them out of a job. ID: 39134 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0	Message 39137 - Posted: 17 Jun 2019, 8:00:28 UTC - in response to Message 39131. So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. I will look into this. Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread. ID: 39137 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 39138 - Posted: 17 Jun 2019, 15:25:10 UTC - in response to Message 39137. So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. I will look into this. Don't bother. The solution is nigh. Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread. What for? Crystal Pellet and a few others have been posting links and log snippets for 10 years and all that's got us is devs have turned long runners into forever runners. That's not progress. That's regress. ID: 39138 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2568 Credit: 258,722,989 RAC: 119,238	Message 39146 - Posted: 18 Jun 2019, 7:27:58 UTC Not all sherpas fail: https://lhcathome.cern.ch/lhcathome/result.php?resultid=233404083 https://lhcathome.cern.ch/lhcathome/result.php?resultid=233425927 I would give them a chance to finish. ID: 39146 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027	Message 39478 - Posted: 1 Aug 2019, 14:43:19 UTC Last modified: 1 Aug 2019, 14:45:11 UTC <message> Disk usage limit exceeded </message> 13:55:58 CEST +02:00 2019-08-01: cranky-0.0.29: [INFO] ===> [runRivet] Thu Aug 1 11:55:58 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.6.1.atlas default 100000 84] https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=119890616 ID: 39478 · Reply Quote

m Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,632,508 RAC: 2,107	Message 39485 - Posted: 2 Aug 2019, 19:53:11 UTC I would like to run LinuxN tasks on Ubuntu 14.04.6 (kernel 3.13) No errors setting things up, but tasks fail like this:- 19:13:41 (3159): wrapper (7.15.26016): starting 19:13:42 (3159): wrapper (7.15.26016): starting 19:13:42 (3159): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 () 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Detected TheoryN App 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking CVMFS. 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking runc. /cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version 19:13:43 BST +01:00 2019-08-02: cranky-0.0.29: [ERROR] 'runc -v' failed. 19:13:44 (3159): cranky exited; CPU time 0.008713 19:13:44 (3159): app exit status: 0xce 19:13:44 (3159): called boinc_finish(195) The installed libseccomp version is:- libseccomp2 2.1.1-1ubuntu1-trusty5 I have tried temporarily installing the "seccomp" package with no effect. Temporarily installing the "lxc" and "lxcfs" packages also had no effect. Any ideas welcome ID: 39485 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2568 Credit: 258,722,989 RAC: 119,238	Message 39487 - Posted: 2 Aug 2019, 21:36:21 UTC - in response to Message 39485. IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems. Step 1 You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux. Step 2 Save the lib in an arbitrary folder and within this folder create a symlink to the lib. Example: /usr/local/lib64/runc_for_lhc/ lrwxrwxrwx 1 root root 19 4. Dez 2017 libseccomp.so.2 -> libseccomp.so.2.3.1 -rwxr-xr-x 1 root root 259K 10. Mai 2017 libseccomp.so.2.3.1 Step 3 create a config file /etc/ld.so.conf.d/runc.conf where you configure the path to your new libs: /usr/local/lib64/runc_for_lhc Activate this config running "sudo ldconfig". Step 4 Check which libs will be used ldd /cvmfs/grid.cern.ch/vc/containers/runc linux-vdso.so.1 (0x00007ffdeeb76000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6408915000) libseccomp.so.2 => /usr/local/lib64/runc_for_lhc/libseccomp.so.2 (0x00007f64086d4000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f64084d0000) libc.so.6 => /lib64/libc.so.6 (0x00007f6408122000) /lib64/ld-linux-x86-64.so.2 (0x00007f6408b33000) ID: 39487 · Reply Quote

m Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,632,508 RAC: 2,107	Message 39490 - Posted: 3 Aug 2019, 11:36:10 UTC - in response to Message 39487. IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems. OK that's helpful, thanks, and for the instructions, too. In the meantime I had found and installed v2.2.3-2 (from trusty backports) which didn't help. You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux I've got the .deb archive for 2.4.1 (which appears to be available as a security update for 16.04) but can't get a usable library file out of it. It's 41bytes and shows up as a "broken link" I've clearly done something wrong somewhere. It looks like moving to OS version 16.04 might be needed but that means much downloading.... don't want that at the moment., so the hunt goes on. ID: 39490 · Reply Quote

LHC@home