Thread 'Boinc memory estimate and LHC Settings'

Author	Message
csbyseti Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0	Message 34977 - Posted: 12 Apr 2018, 16:27:21 UTC I used Atlas for some month until the Upload problems occur. It was no problem to calculate 5 Atlas WU's with 3 Threads on a computer with 32GB Ram. After i restart LHC it was not possible (with same app_config.xml) to calculate more than 4 WU's at the same time. Reason: Boinc used a 9100MB value for memory estimation. -> not enough memory for next active Task When i reduce the "maximum number of CPUs" to 3 in the LHC setting Boinc uses the correct 5300MB value for memory estimation of every WU. But now, i'll get only 3 WU's instead of the 8 WU's set in "maximum number of work" in the LHC setting. So the memory estimation in Boinc will be fixed but boinc get not enough WU's to calculate 5 WU's at same time. Running a cpu with 50 - 80% of load is wasting CPU-Power. Running two different Projects is no option because of the ugly Boinc sheduler. Please fix this behavior. ID: 34977 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1552 Credit: 10,072,995 RAC: 710	Message 34981 - Posted: 13 Apr 2018, 9:27:19 UTC - in response to Message 34977. I tried to reproduce this behavior, but could not do exactly your setup, cause I had already 4 Theory's running. But you're right. There is something wrong with the preference settings. With 4 Theory's running and setting max 8 for jobs, I got only 3 new ATLAS tasks. Setting max for jobs same result: no new tasks. ID: 34981 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 939 Credit: 781,720,253 RAC: 76,849	Message 34999 - Posted: 13 Apr 2018, 19:13:59 UTC Last modified: 13 Apr 2018, 19:15:31 UTC With No limit for tasks I get 3 ATLAS. I have the same 3 cores to ensure that the memory estmates are correct. Before I could run 12x2 tasks concurently but now the limit is 8 Jobs. I think now you see 5x3 = 15 limit on 16 core machine. I gave up messing with the settings, ATLAS doesn't work well with other project running at the same time. Therefore I just don't contribute much to ATLAS, if they changes the settings then I will do more. If you want to run more then you must run mutiple instances of BOINC ID: 34999 · Reply Quote

csbyseti Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0	Message 35024 - Posted: 16 Apr 2018, 18:20:07 UTC - in response to Message 34999. try to show this behavior more clear: Goal: 5 WU's with 3 CPUs activ. Part of app_config.xml: <app_name>ATLAS</app_name> <version_num>100</version_num> <platform>windows_x86_64</platform> <avg_ncpus>3.000000</avg_ncpus> <max_ncpus>3.000000</max_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 5300</cmdline> <dont_throttle/> <is_wrapper/> <needs_network/> Boinc takes this values for the VM's Start preference Settings: maximum number of work: 7 maximum number of cpus: 4 --> only 4 activ WU's, no free WU in pipeline switching maximum number of cpu's to 5 one more WU downloaded an started!! 16.04.2018 19:01:31 \| LHC@home \| [mem_usage] 9EmLDmoJHSsnlyackoJh5iwnABFKDmABFKDmLbmWDmABFKDmrxkvgm_0: WS 5.28MB, smoothed 6200.00MB, swap 135.92MB, 0.00 page faults/sec, user CPU 30.422, kernel CPU 27963.031 16.04.2018 19:01:31 \| LHC@home \| [mem_usage] bs0MDmwBHSsnyYickojUe11pABFKDmABFKDmz15UDmABFKDmwYmSkm_0: WS 5.45MB, smoothed 6200.00MB, swap 135.14MB, 0.00 page faults/sec, user CPU 25.297, kernel CPU 19489.547 16.04.2018 19:01:31 \| LHC@home \| [mem_usage] 3TLODmenHSsnlyackoJh5iwnABFKDmABFKDm4F1WDmABFKDmhJR8bn_0: WS 5.64MB, smoothed 6200.00MB, swap 136.42MB, 0.00 page faults/sec, user CPU 23.578, kernel CPU 15982.234 16.04.2018 19:01:31 \| LHC@home \| [mem_usage] 3hkNDmq0ISsnyYickojUe11pABFKDmABFKDmZkwVDmABFKDm0ln2Jo_0: WS 5.64MB, smoothed 6200.00MB, swap 136.80MB, 0.00 page faults/sec, user CPU 21.266, kernel CPU 11155.875 16.04.2018 19:01:31 \| LHC@home \| [mem_usage] 8pmMDmJVDSsnlyackoJh5iwnABFKDmABFKDmAW5UDmABFKDmv1aqLn_1: WS 9.58MB, smoothed 7100.00MB, swap 124.29MB, 0.00 page faults/sec, user CPU 1.734, kernel CPU 3.703 16.04.2018 19:01:31 \| \| [mem_usage] BOINC totals: WS 44.24MB, smoothed 31930.41MB, swap 1744.37MB, 0.00 page faults/sec 16.04.2018 19:01:31 \| \| [mem_usage] All others: WS 648.27MB, swap 7009.15MB, user 28834.250s, kernel 30899.547s 16.04.2018 19:01:31 \| \| [mem_usage] non-BOINC CPU usage: 0.53% Boinc uses 6200MB for the older WU's with max cpus setting of 4 and 7100MB for the last WU with max cpus setting of 5 ( don't forgot: VM uses 5300MB) And now, one of the "4" CPU's is completed, upload will start and, after upload, the next download start. 16.04.2018 19:24:40 \| LHC@home \| Finished download of RXBMDmCFKSsnlyackoJh5iwnABFKDmABFKDmMn2XDmABFKDmHwh6Ao_EVNT.13620778._000602.pool.root.1 16.04.2018 19:24:43 \| \| [mem_usage] enforce: available RAM 32714.59MB swap 40714.59MB 16.04.2018 19:24:43 \| LHC@home \| [cpu_sched_debug] enforce: task RXBMDmCFKSsnlyackoJh5iwnABFKDmABFKDmMn2XDmABFKDmHwh6Ao_0 can't run, too big 7100.00MB > 6850.85MB 16.04.2018 19:24:46 \| LHC@home \| [mem_usage] bs0MDmwBHSsnyYickojUe11pABFKDmABFKDmz15UDmABFKDmwYmSkm_0: WS 11.97MB, smoothed 6200.00MB, swap 135.25MB, 0.00 page faults/sec, user CPU 27.922, kernel CPU 23663.031 16.04.2018 19:24:46 \| LHC@home \| [mem_usage] 3TLODmenHSsnlyackoJh5iwnABFKDmABFKDm4F1WDmABFKDmhJR8bn_0: WS 11.13MB, smoothed 6200.00MB, swap 136.42MB, 0.00 page faults/sec, user CPU 26.766, kernel CPU 20132.906 16.04.2018 19:24:46 \| LHC@home \| [mem_usage] 3hkNDmq0ISsnyYickojUe11pABFKDmABFKDmZkwVDmABFKDm0ln2Jo_0: WS 10.73MB, smoothed 6200.00MB, swap 136.18MB, 0.00 page faults/sec, user CPU 24.219, kernel CPU 15326.422 16.04.2018 19:24:46 \| LHC@home \| [mem_usage] 8pmMDmJVDSsnlyackoJh5iwnABFKDmABFKDmAW5UDmABFKDmv1aqLn_1: WS 16.69MB, smoothed 7100.00MB, swap 136.03MB, 0.00 page faults/sec, user CPU 17.516, kernel CPU 2899.625 16.04.2018 19:24:46 \| \| [mem_usage] BOINC totals: WS 230.51MB, smoothed 25871.87MB, swap 1636.14MB, 0.00 page faults/sec 16.04.2018 19:24:46 \| \| [mem_usage] All others: WS 1776.46MB, swap 7308.82MB, user 28964.828s, kernel 31025.063s 16.04.2018 19:24:46 \| \| [mem_usage] non-BOINC CPU usage: 1.27% New WU shows also the 7100MB value and this exceed the 32000MB of Memory. This shows: the number of downloaded WU's only depend on the value in "maximum number of cpus" and not in "maximum number of work:" But "maximum number of cpus" will also increase the amount of memory which Boinc uses for memory calculation With this preference behaviour it's not possible to run more than 4 Atlas WU's at same time. Sorry for the long text, hope it's more clear to fetch the problem ID: 35024 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 0	Message 35025 - Posted: 16 Apr 2018, 19:43:56 UTC - in response to Message 35024. They should increase beyond 8 the maximum number of CPUs and the maximum number of work units as options to choose. I can run 7 instances of 3-unit Atlas tasks on 21 cores. But I cannot run more than 8 2-core tasks on the same machine. Regards, Bob P. ID: 35025 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,489,006 RAC: 70,695	Message 35026 - Posted: 17 Apr 2018, 6:39:12 UTC - in response to Message 35024. Hello csbyseti ATLAS is currently a bit obstinate handing out more than 1 task per request. Sometimes a couple of requests are necessary to fill the local buffers. The large input files may play a role but I'm not sure about that. On the other hand there are some measures you can do to be sure it's not your system that causes the problems. The snippet from your app_config.xml looks a bit malformed. It contains a couple of variables that seem to be copied from the client_state.xml but should not appear in the app_config.xml. The correct template can be found in the BOINC documentation: http://boinc.berkeley.edu/wiki/client_configuration#Application_configuration I would suggest to create a fresh app_config.xml that strictly follows the documented form. You may focus on the following lines: <avg_ncpus>3.000000</avg_ncpus> <cmdline>--nthreads 3 --memory_size_mb 5300</cmdline> <avg_ncpus>, --nthreads and the project's web preferences should be set to the same value. You may also check of your VirtualBox environment as some (older) logs show unclean shutdowns: 2018-04-11 18:01:57 (12620): VM did not power off when requested. 2018-04-11 18:01:57 (12620): VM was successfully terminated. Finally you may run a fresh set of VMs after a client restart or a reboot. ID: 35026 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 115 Credit: 9,101,868 RAC: 3,804	Message 35149 - Posted: 3 May 2018, 8:41:04 UTC - in response to Message 34977. Dear colleagues! Explain, please, some statements in LHC@home preferences. 1. Max # of jobs for this project - this means a limit for maximum of downloaded tasks at all (running + waiting to run + ready to start)? 2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time? ID: 35149 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,489,006 RAC: 70,695	Message 35150 - Posted: 3 May 2018, 9:09:05 UTC - in response to Message 35149. Last modified: 3 May 2018, 9:10:28 UTC 1. Max # of jobs for this project - this means a limit for maximum of downloaded tasks at all (running + waiting to run + ready to start)? Yes (+ finished but not yet reported) 2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time? No. It limits the number of CPU cores used by a multicore app. Thus it also affects the RAM requirements calculated by the project server. At the moment only ATLAS provides a multicore app. Some users reported unexpected behavior when they change those settings. If you also notice that be so kind as to report it in the MB. ID: 35150 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1305 Credit: 95,841,367 RAC: 18,776	Message 35151 - Posted: 3 May 2018, 9:38:09 UTC - in response to Message 35150. We should have multicore CMS and Theory here in the near future since just myself I have run thousands of them and especially the Theory version run fine and the CMS has a problem now and then but I think we may have that up and running again too. Volunteer Mad Scientist For Life unbelievable are you trying to promote linux again? ID: 35151 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 115 Credit: 9,101,868 RAC: 3,804	Message 35152 - Posted: 3 May 2018, 9:47:57 UTC - in response to Message 35150. Last modified: 3 May 2018, 9:49:41 UTC computezrmle! Thank You very much for so fast reply! 2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time? No. It limits the number of CPU cores used by a multicore app. Dear Project team! As You can see, current definition may provoke misunderstandings. Can You correct it (for example, Max # or CPUs for multicore applications)? Or just add some explanation, what this statement means? Also, some of LHC vbox applications are a significant abuse for "typical / standard" volunteer's PC used at home or at work. Is it possible to add one more statement, which would allow to limit maximum of LHC tasks running at the same time? Of course, this is not necessary at all for SixTrack... But it necessary already for LHCb, which eats 2 GB of RAM pro task... ID: 35152 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,489,006 RAC: 70,695	Message 35153 - Posted: 3 May 2018, 10:09:44 UTC - in response to Message 35152. ]... Is it possible to add one more statement, which would allow to limit maximum of LHC tasks running at the same time? ...[/quote] To limit the number of concurrently running instances of a distinct app you may use a local app_config.xml file. See the BOINC documentation for a general overview: http://boinc.berkeley.edu/wiki/client_configuration#Application_configuration A simple example could look like this: [pre]<app_config> <app> <name>Theory</name> <max_concurrent>2</max_concurrent> </app> <app> <name>LHCb</name> <max_concurrent>1</max_concurrent> </app> <project_max_concurrent>2</project_max_concurrent> </app_config>[/pre] ID: 35153 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 115 Credit: 9,101,868 RAC: 3,804	Message 35154 - Posted: 3 May 2018, 11:39:01 UTC - in response to Message 35153. computezrmle Thank You very much again! Yes, of course - XML always is a solution. <demagogy> But for advanced users only - at least ONE incorrect symbol in XML code is resulting from "this is just not working" to "this ruined all". Some time ago I had many hours of pain until I finally activated optimized apps for SETI and Einstein... Also which percent of volunteers may be named as "advanced"? </demagogy> In THIS situation: 1. YOUR code is not working. 2. MY code is not working too: <app_config> <app> <name>Theory Simulation</name> <max_concurrent>3</max_concurrent> </app> <app> <name>LHCb Simulation</name> <max_concurrent>1</max_concurrent> </app> <report_results_immediately/> </app_config> Maybe, <name> is still incorrect? Any ideas? ID: 35154 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,489,006 RAC: 70,695	Message 35155 - Posted: 3 May 2018, 12:41:28 UTC - in response to Message 35154. Yes, of course - XML always is a solution. No, not always ;-) <demagogy> But for advanced users only ... </demagogy> <demagogy> A volunteer who - contributes to the project for more than 10 years - collected more than 6 mio credit points - posted roughly 100 comments - changed his local setup to run non standard apps for different projects - ... is NOT a novice. At least some knowledge can be expected regarding: - how to correctly code/copy a very simple xml file - where to place that file - what to do afterwards, e.g. "reload config files" or "restart BOINC client" or "reboot the computer" </demagogy> <switch_back_to_normal_mode/> In THIS situation: 1. YOUR code is not working. 2. MY code is not working too: <app_config> <app> <name>Theory Simulation</name> <max_concurrent>3</max_concurrent> </app> <app> <name>LHCb Simulation</name> <max_concurrent>1</max_concurrent> </app> <report_results_immediately/> </app_config> Maybe, <name> is still incorrect? Any ideas? wrong: <name>Theory Simulation</name> wrong: <name>LHCb Simulation</name> correct: <name>Theory</name> correct: <name>LHCb</name> Hint1: examine your client_state.xml Hint2: Only a novice may save client_state.xml while the client is running. It doesn't work? Did you "reload config files" (should be enough in this case) or "restart BOINC client" or "reboot the computer"? Did you examine your BOINC client log? ID: 35155 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 115 Credit: 9,101,868 RAC: 3,804	Message 35157 - Posted: 3 May 2018, 15:54:21 UTC - in response to Message 35155. Last modified: 3 May 2018, 15:56:39 UTC ...is NOT a novice. All of these years my hosts processed SixTrack only - hope, this explains all. ;-) Hint1: examine your client_state.xml Currently no idea, which statement is missing or incorrect there (between <project> </project>). Hint2: Only a novice may save client_state.xml while the client is running. Excellent hint. :oD It doesn't work? Not yet, unfortunately. Did you examine your BOINC client log? ??? cc_config.xml not found - using defaults ??? This file is really missing. ID: 35157 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,489,006 RAC: 70,695	Message 35158 - Posted: 3 May 2018, 16:36:31 UTC - in response to Message 35157. "Hint1" was given in relation to the correct application name as explained in the BOINC documentation. The app_config.xml has to be stored in the folder: ...\base_of_your_BOINC_client\projects\lhcathome.cern.ch_lhcathome\ If you do a "reload config files", your client log must show a line similar to this: Do 03 Mai 2018 18:23:06 CEST \| LHC@home \| Found app_config.xml options like "<max_concurrent>" become active immediately after the reload. This can be checked (counted) in the BOINC manager's task list. ID: 35158 · Reply Quote

metalius Send message Joined: 3 Oct 06 Posts: 115 Credit: 9,101,868 RAC: 3,804	Message 35159 - Posted: 3 May 2018, 17:05:30 UTC - in response to Message 35158. I just found - I have TWO BOINC_Data folders. I was not accurate, when I upgraded BOINC several days ago. Thank You very much for Your patience! And GOOD LUCK! ID: 35159 · Reply Quote