Message boards :
Theory Application :
New Native Theory Version 1.1
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes. Are you in some testing phase??? Unchain this thing and feed the beast. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
It is probably because you are doing SixTrack. They have put a higher priority on that. Just do ATLAS and Theory, and you will get more. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day. So far for me ATLAS runs best as 8C WUs. But NT is inefficient with more than 1C. Since CERN has chosen to make MAX#CPUs a global parameter instead of splitting them out separately for each project they work to opposite ends. I've tried letting my WUs run down so I'm short and then checking only 1C NT but the most I get is 2 WUs in the morning and if I behave myself they give me 4 in the evening. Since ST is only 1C I can check ST & ATLAS 8C at the same time and it works ok. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I can't check both Native Theory and ATLAS at the same time because ATLAS uses Reverse Romulan Logic where every day is Opposite Day. OK. I wasn't sure which planet it came from. Here is what I set for the "school" location. You can of course do different locations for the different projects and set them as you want for each location: Resource share 100 Then, I use this app_config.xml in the program folder to set native Theory to one CPU core, and native ATLAS to two CPU cores. It works for me, though I am only running on an i7-4790 with 8 cores, and don't know how it would do on more. <app_config> <app> <name>ATLAS</name> <report_results_immediately/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>native_mt</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 3600</cmdline> </app_version> <app> <name>TheoryN</name> <report_results_immediately/> </app> <app_version> <app_name>TheoryN</app_name> <plan_class>native_theory</plan_class> <avg_ncpus>1</avg_ncpus> </app_version> </app_config> |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027 |
I only get 2 Native Theory WUs per computer. I guess if I'm good it allows me 2 extra sometimes. This thread is not for small-talk. Read all Info's About it and you know why only two Tasks running at one time in Linux-native! |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
Yes you're right maeax, NT is small potatoes. I've turned it off until it fixes its issues. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Read all Info's About it and you know why only two Tasks running at one time in Linux-native! With the setup I have just posted, I typically run four native Theory (1 core per work unit) and two native ATLAS (2 cores per work unit), to fill up the 8 cores on my machine. It is hazardous to predict the behavior of the LHC scheduler. You have to either try it yourself, or consult with the Romulans. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027 |
Have a Sherpa with more than 240 hours running saved and started once more to see why there is no end: integration time: ( 2m 45s elapsed / 5s left ) 3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) 310000 ( 359309 -> 86.5 % ) integration time: ( 2m 50s elapsed / 0s left ) 2_4__e-__e+__jLight__jLight__jLight__jLight : 3.33798 pb +- ( 0.0147447 pb = 0.441724 % ) exp. eff: 0.344254 % reduce max for 2_4__e-__e+__jLight__jLight__jLight__jLight to 0.781627 ( eps = 0.001 ) Single_Process::CalculateTotalXSec(): Calculate xs for '2_2__e-__e+__b__bb' (Internal) Starting the calculation. Lean back and enjoy ... . 3.2401 pb +- ( 0.0337927 pb = 1.04295 % ) 5000 ( 5000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.2394 pb +- ( 0.0239453 pb = 0.739188 % ) 10000 ( 10000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.23307 pb +- ( 0.0194224 pb = 0.60074 % ) 15000 ( 15000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) 3.23459 pb +- ( 0.01685 pb = 0.520932 % ) 20000 ( 20000 -> 100 % ) full optimization: ( 0s elapsed / 1s left ) Is it possible to eliminate this process with > 100 %, because it find no end. Saw this in COMAX. In native is no time-limit. Only Volunteer can stop this task. Edit: 12:19:53 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] Running Container 'runc'. 12:19:54 CEST +02:00 2019-06-01: cranky-0.0.29: [INFO] ===> [runRivet] Sat Jun 1 10:19:54 UTC 2019 [boinc ee zhad 197 - - sherpa 1.3.0 default 1000 64] 10:11:36 (20827): wrapper (7.15.26016): starting 10:11:36 (20827): wrapper (7.15.26016): starting 10:11:36 (20827): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 () 10:11:36 CEST +02:00 2019-06-11: cranky-0.0.29: [INFO] Detected TheoryN App |
Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,104,141 RAC: 61 |
On a never ending TheoryN, I got this in the runRivet.log, "Y out of bounds !" repeatedly : +----------------------------------+ | | | CCC OOO M M I X X | | C O O MM MM I X X | | C O O M M M I X | | C O O M M I X X | | CCC OOO M M I X X | | | +==================================+ | Color dressed Matrix Elements | | http://comix.freacafe.de | | please cite JHEP12(2008)039 | +----------------------------------+ Matrix_Element_Handler::BuildProcesses(): Looking for processes ............................................................................................ done ( 229284 kB, 11s ). Matrix_Element_Handler::InitializeProcesses(): Performing tests .................................................................................... done ( 229680 kB, 0s ). Initialized the Matrix_Element_Handler for the hard processes. Initialized the Soft_Photon_Handler. Hadron_Decay_Map::Read: Initializing HadronDecays.dat. This may take some time. Initialized the Hadron_Decay_Handler, Decay model = Hadrons Process_Group::CalculateTotalXSec(): Calculate xs for '2_2__j__j__e-__nu_eb' (Internal) Starting the calculation. Lean back and enjoy ... . Updating display... Display update finished (0 histograms, 0 events). ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 0.0049, 4.9e+07, 4.9e+07 vs. 0.0049 Channel_Elements::GenerateYBackward(2.1181500151722e-10,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-5.10672,^H}): Y out of bounds ! ymin, ymax vs. y : -10 10 vs. 10 Setting y to upper bound ymax=10 Updating display... Display update finished (0 histograms, 0 events). Channel_Elements::GenerateYForward(0.65305151839935,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,7.31876,^H}): Y out of bounds ! ymin, ymax vs. y : -0.21305 0.21305 vs. -0.21305 Setting y to lower bound ymin=-0.21305 Channel_Elements::GenerateYForward(0.0374227,{-8.98847e+307,0,-8.98847e+307,0,0,^H},{-10,10,-0.581824,^H}): Y out of bounds ! etc ... |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=230970330 3 days after the server marked the above task, a Sherpa, "abandoned", it was still running on my host. It's not the first one, I've had several others in the past month. Congrats LHC devs, you've taken Sherpa long runners and made them into forever runners. So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027 |
This Sherpa 20:37:26 CEST +02:00 2019-06-10: cranky-0.0.29: [INFO] ===> [runRivet] Mon Jun 10 18:37:26 UTC 2019 [boinc pp jets 8000 150,-,2360 - sherpa 2.2.5 default 4000 67] finished after 2 days and 22 Hours - not all are bad: https://lhcathome.cern.ch/lhcathome/result.php?resultid=231900388 |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
True, not all are bad. The problem, at least for me, is that the forever runners pile up in a big log jam and block the long runners that have a chance at succeeding. That can't be good for the science. I guess if I were a "serious" cruncher I would have time to manually pick the log jam apart (with assistance from MC Production) but cherry picking is boring as well as tedious. If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day. Depending on how lucky they feel, watchdog users will be able to opt to accept Sherpa but set max task duration to as many days as they want or allow them to run forever. Maybe having such options available will get the number of TheoryN users up into the 50's. ~35 is abysmal. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
If I were a "serious" programmer I guess I would automate the "cherry picking via MC Production" with some script. Maybe one day. I think they should do that. We shouldn't be putting them out of a job. |
Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 |
So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. I will look into this. Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Don't bother. The solution is nigh.So... watchdog script reconfigured to reject any and all Sherpa and limit task duration to 20 hours. Bad for the science? Maybe I'll care about that when LHC devs care about wasting my CPU time. Please continue to post links to tasks and log snippets regarding long running or looping jobs to Theory's endless looping thread.What for? Crystal Pellet and a few others have been posting links and log snippets for 10 years and all that's got us is devs have turned long runners into forever runners. That's not progress. That's regress. |
Send message Joined: 15 Jun 08 Posts: 2568 Credit: 258,722,989 RAC: 119,238 |
Not all sherpas fail: https://lhcathome.cern.ch/lhcathome/result.php?resultid=233404083 https://lhcathome.cern.ch/lhcathome/result.php?resultid=233425927 I would give them a chance to finish. |
Send message Joined: 2 May 07 Posts: 2257 Credit: 174,366,760 RAC: 20,027 |
<message> Disk usage limit exceeded </message> 13:55:58 CEST +02:00 2019-08-01: cranky-0.0.29: [INFO] ===> [runRivet] Thu Aug 1 11:55:58 UTC 2019 [boinc pp zinclusive 7000 -,-,50,130 - madgraph5amc 2.6.1.atlas default 100000 84] https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=119890616 |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,632,508 RAC: 2,107 |
I would like to run LinuxN tasks on Ubuntu 14.04.6 (kernel 3.13) No errors setting things up, but tasks fail like this:- 19:13:41 (3159): wrapper (7.15.26016): starting 19:13:42 (3159): wrapper (7.15.26016): starting 19:13:42 (3159): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 () 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Detected TheoryN App 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking CVMFS. 19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking runc. /cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version 19:13:43 BST +01:00 2019-08-02: cranky-0.0.29: [ERROR] 'runc -v' failed. 19:13:44 (3159): cranky exited; CPU time 0.008713 19:13:44 (3159): app exit status: 0xce 19:13:44 (3159): called boinc_finish(195) The installed libseccomp version is:- libseccomp2 2.1.1-1ubuntu1-trusty5 I have tried temporarily installing the "seccomp" package with no effect. Temporarily installing the "lxc" and "lxcfs" packages also had no effect. Any ideas welcome |
Send message Joined: 15 Jun 08 Posts: 2568 Credit: 258,722,989 RAC: 119,238 |
IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems. Step 1 You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux. Step 2 Save the lib in an arbitrary folder and within this folder create a symlink to the lib. Example: /usr/local/lib64/runc_for_lhc/ lrwxrwxrwx 1 root root 19 4. Dez 2017 libseccomp.so.2 -> libseccomp.so.2.3.1 -rwxr-xr-x 1 root root 259K 10. Mai 2017 libseccomp.so.2.3.1 Step 3 create a config file /etc/ld.so.conf.d/runc.conf where you configure the path to your new libs: /usr/local/lib64/runc_for_lhc Activate this config running "sudo ldconfig". Step 4 Check which libs will be used ldd /cvmfs/grid.cern.ch/vc/containers/runc linux-vdso.so.1 (0x00007ffdeeb76000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6408915000) libseccomp.so.2 => /usr/local/lib64/runc_for_lhc/libseccomp.so.2 (0x00007f64086d4000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f64084d0000) libc.so.6 => /lib64/libc.so.6 (0x00007f6408122000) /lib64/ld-linux-x86-64.so.2 (0x00007f6408b33000) |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,632,508 RAC: 2,107 |
IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems. OK that's helpful, thanks, and for the instructions, too. In the meantime I had found and installed v2.2.3-2 (from trusty backports) which didn't help. You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux I've got the .deb archive for 2.4.1 (which appears to be available as a security update for 16.04) but can't get a usable library file out of it. It's 41bytes and shows up as a "broken link" I've clearly done something wrong somewhere. It looks like moving to OS version 16.04 might be needed but that means much downloading.... don't want that at the moment., so the hunt goes on. |
©2025 CERN