Message boards : Number crunching : Boinc memory estimate and LHC Settings
Message board moderation

To post messages, you must log in.

AuthorMessage
csbyseti

Send message
Joined: 6 Jul 17
Posts: 22
Credit: 29,322,273
RAC: 6,916
Message 34977 - Posted: 12 Apr 2018, 16:27:21 UTC

I used Atlas for some month until the Upload problems occur.
It was no problem to calculate 5 Atlas WU's with 3 Threads on a computer with 32GB Ram.
After i restart LHC it was not possible (with same app_config.xml) to calculate more than 4 WU's at the same time.
Reason: Boinc used a 9100MB value for memory estimation. -> not enough memory for next active Task
When i reduce the "maximum number of CPUs" to 3 in the LHC setting Boinc uses the correct 5300MB value for memory estimation of every WU.

But now, i'll get only 3 WU's instead of the 8 WU's set in "maximum number of work" in the LHC setting.
So the memory estimation in Boinc will be fixed but boinc get not enough WU's to calculate 5 WU's at same time.

Running a cpu with 50 - 80% of load is wasting CPU-Power.
Running two different Projects is no option because of the ugly Boinc sheduler.

Please fix this behavior.
ID: 34977 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1100
Credit: 6,866,700
RAC: 823
Message 34981 - Posted: 13 Apr 2018, 9:27:19 UTC - in response to Message 34977.  

I tried to reproduce this behavior, but could not do exactly your setup, cause I had already 4 Theory's running.
But you're right. There is something wrong with the preference settings.
With 4 Theory's running and setting max 8 for jobs, I got only 3 new ATLAS tasks.
Setting max for jobs same result: no new tasks.
ID: 34981 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 727
Credit: 478,781,594
RAC: 269,237
Message 34999 - Posted: 13 Apr 2018, 19:13:59 UTC
Last modified: 13 Apr 2018, 19:15:31 UTC

With No limit for tasks I get 3 ATLAS. I have the same 3 cores to ensure that the memory estmates are correct.

Before I could run 12x2 tasks concurently but now the limit is 8 Jobs.

I think now you see 5x3 = 15 limit on 16 core machine.

I gave up messing with the settings, ATLAS doesn't work well with other project running at the same time. Therefore I just don't contribute much to ATLAS, if they changes the settings then I will do more.

If you want to run more then you must run mutiple instances of BOINC
ID: 34999 · Report as offensive     Reply Quote
csbyseti

Send message
Joined: 6 Jul 17
Posts: 22
Credit: 29,322,273
RAC: 6,916
Message 35024 - Posted: 16 Apr 2018, 18:20:07 UTC - in response to Message 34999.  

try to show this behavior more clear:
Goal: 5 WU's with 3 CPUs activ.
Part of app_config.xml:
<app_name>ATLAS</app_name>
<version_num>100</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>3.000000</avg_ncpus>
<max_ncpus>3.000000</max_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<api_version>7.7.0</api_version>
<cmdline>--memory_size_mb 5300</cmdline>
<dont_throttle/>
<is_wrapper/>
<needs_network/>
Boinc takes this values for the VM's

Start preference Settings:
maximum number of work: 7
maximum number of cpus: 4
--> only 4 activ WU's, no free WU in pipeline
switching maximum number of cpu's to 5
one more WU downloaded an started!!

16.04.2018 19:01:31 | LHC@home | [mem_usage] 9EmLDmoJHSsnlyackoJh5iwnABFKDmABFKDmLbmWDmABFKDmrxkvgm_0: WS 5.28MB, smoothed 6200.00MB, swap 135.92MB, 0.00 page faults/sec, user CPU 30.422, kernel CPU 27963.031
16.04.2018 19:01:31 | LHC@home | [mem_usage] bs0MDmwBHSsnyYickojUe11pABFKDmABFKDmz15UDmABFKDmwYmSkm_0: WS 5.45MB, smoothed 6200.00MB, swap 135.14MB, 0.00 page faults/sec, user CPU 25.297, kernel CPU 19489.547
16.04.2018 19:01:31 | LHC@home | [mem_usage] 3TLODmenHSsnlyackoJh5iwnABFKDmABFKDm4F1WDmABFKDmhJR8bn_0: WS 5.64MB, smoothed 6200.00MB, swap 136.42MB, 0.00 page faults/sec, user CPU 23.578, kernel CPU 15982.234
16.04.2018 19:01:31 | LHC@home | [mem_usage] 3hkNDmq0ISsnyYickojUe11pABFKDmABFKDmZkwVDmABFKDm0ln2Jo_0: WS 5.64MB, smoothed 6200.00MB, swap 136.80MB, 0.00 page faults/sec, user CPU 21.266, kernel CPU 11155.875
16.04.2018 19:01:31 | LHC@home | [mem_usage] 8pmMDmJVDSsnlyackoJh5iwnABFKDmABFKDmAW5UDmABFKDmv1aqLn_1: WS 9.58MB, smoothed 7100.00MB, swap 124.29MB, 0.00 page faults/sec, user CPU 1.734, kernel CPU 3.703
16.04.2018 19:01:31 | | [mem_usage] BOINC totals: WS 44.24MB, smoothed 31930.41MB, swap 1744.37MB, 0.00 page faults/sec
16.04.2018 19:01:31 | | [mem_usage] All others: WS 648.27MB, swap 7009.15MB, user 28834.250s, kernel 30899.547s
16.04.2018 19:01:31 | | [mem_usage] non-BOINC CPU usage: 0.53%

Boinc uses 6200MB for the older WU's with max cpus setting of 4 and 7100MB for the last WU with max cpus setting of 5 ( don't forgot: VM uses 5300MB)

And now, one of the "4" CPU's is completed, upload will start and, after upload, the next download start.

16.04.2018 19:24:40 | LHC@home | Finished download of RXBMDmCFKSsnlyackoJh5iwnABFKDmABFKDmMn2XDmABFKDmHwh6Ao_EVNT.13620778._000602.pool.root.1
16.04.2018 19:24:43 | | [mem_usage] enforce: available RAM 32714.59MB swap 40714.59MB
16.04.2018 19:24:43 | LHC@home | [cpu_sched_debug] enforce: task RXBMDmCFKSsnlyackoJh5iwnABFKDmABFKDmMn2XDmABFKDmHwh6Ao_0 can't run, too big 7100.00MB > 6850.85MB
16.04.2018 19:24:46 | LHC@home | [mem_usage] bs0MDmwBHSsnyYickojUe11pABFKDmABFKDmz15UDmABFKDmwYmSkm_0: WS 11.97MB, smoothed 6200.00MB, swap 135.25MB, 0.00 page faults/sec, user CPU 27.922, kernel CPU 23663.031
16.04.2018 19:24:46 | LHC@home | [mem_usage] 3TLODmenHSsnlyackoJh5iwnABFKDmABFKDm4F1WDmABFKDmhJR8bn_0: WS 11.13MB, smoothed 6200.00MB, swap 136.42MB, 0.00 page faults/sec, user CPU 26.766, kernel CPU 20132.906
16.04.2018 19:24:46 | LHC@home | [mem_usage] 3hkNDmq0ISsnyYickojUe11pABFKDmABFKDmZkwVDmABFKDm0ln2Jo_0: WS 10.73MB, smoothed 6200.00MB, swap 136.18MB, 0.00 page faults/sec, user CPU 24.219, kernel CPU 15326.422
16.04.2018 19:24:46 | LHC@home | [mem_usage] 8pmMDmJVDSsnlyackoJh5iwnABFKDmABFKDmAW5UDmABFKDmv1aqLn_1: WS 16.69MB, smoothed 7100.00MB, swap 136.03MB, 0.00 page faults/sec, user CPU 17.516, kernel CPU 2899.625
16.04.2018 19:24:46 | | [mem_usage] BOINC totals: WS 230.51MB, smoothed 25871.87MB, swap 1636.14MB, 0.00 page faults/sec
16.04.2018 19:24:46 | | [mem_usage] All others: WS 1776.46MB, swap 7308.82MB, user 28964.828s, kernel 31025.063s
16.04.2018 19:24:46 | | [mem_usage] non-BOINC CPU usage: 1.27%

New WU shows also the 7100MB value and this exceed the 32000MB of Memory.

This shows: the number of downloaded WU's only depend on the value in "maximum number of cpus" and not in "maximum number of work:"
But "maximum number of cpus" will also increase the amount of memory which Boinc uses for memory calculation

With this preference behaviour it's not possible to run more than 4 Atlas WU's at same time.

Sorry for the long text, hope it's more clear to fetch the problem
ID: 35024 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 79
Credit: 25,478,198
RAC: 30
Message 35025 - Posted: 16 Apr 2018, 19:43:56 UTC - in response to Message 35024.  

They should increase beyond 8 the maximum number of CPUs and the maximum number of work units as options to choose.

I can run 7 instances of 3-unit Atlas tasks on 21 cores. But I cannot run more than 8 2-core tasks on the same machine.
Regards,
Bob P.
ID: 35025 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1982
Credit: 142,803,277
RAC: 100,337
Message 35026 - Posted: 17 Apr 2018, 6:39:12 UTC - in response to Message 35024.  

Hello csbyseti

ATLAS is currently a bit obstinate handing out more than 1 task per request.
Sometimes a couple of requests are necessary to fill the local buffers.
The large input files may play a role but I'm not sure about that.

On the other hand there are some measures you can do to be sure it's not your system that causes the problems.


The snippet from your app_config.xml looks a bit malformed.
It contains a couple of variables that seem to be copied from the client_state.xml but should not appear in the app_config.xml.
The correct template can be found in the BOINC documentation:
http://boinc.berkeley.edu/wiki/client_configuration#Application_configuration

I would suggest to create a fresh app_config.xml that strictly follows the documented form.
You may focus on the following lines:
<avg_ncpus>3.000000</avg_ncpus>
<cmdline>--nthreads 3 --memory_size_mb 5300</cmdline>

<avg_ncpus>, --nthreads and the project's web preferences should be set to the same value.


You may also check of your VirtualBox environment as some (older) logs show unclean shutdowns:
2018-04-11 18:01:57 (12620): VM did not power off when requested.
2018-04-11 18:01:57 (12620): VM was successfully terminated.


Finally you may run a fresh set of VMs after a client restart or a reboot.
ID: 35026 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,849,068
RAC: 994
Message 35149 - Posted: 3 May 2018, 8:41:04 UTC - in response to Message 34977.  

Dear colleagues!
Explain, please, some statements in LHC@home preferences.
1. Max # of jobs for this project - this means a limit for maximum of downloaded tasks at all (running + waiting to run + ready to start)?
2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time?
ID: 35149 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1982
Credit: 142,803,277
RAC: 100,337
Message 35150 - Posted: 3 May 2018, 9:09:05 UTC - in response to Message 35149.  
Last modified: 3 May 2018, 9:10:28 UTC

1. Max # of jobs for this project - this means a limit for maximum of downloaded tasks at all (running + waiting to run + ready to start)?

Yes (+ finished but not yet reported)


2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time?

No.
It limits the number of CPU cores used by a multicore app.
Thus it also affects the RAM requirements calculated by the project server.
At the moment only ATLAS provides a multicore app.

Some users reported unexpected behavior when they change those settings.
If you also notice that be so kind as to report it in the MB.
ID: 35150 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1006
Credit: 47,311,807
RAC: 3,514
Message 35151 - Posted: 3 May 2018, 9:38:09 UTC - in response to Message 35150.  

We should have multicore CMS and Theory here in the near future since just myself I have run thousands of them and especially the Theory version run fine and the CMS has a problem now and then but I think we may have that up and running again too.
Volunteer Mad Scientist For Life
ID: 35151 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,849,068
RAC: 994
Message 35152 - Posted: 3 May 2018, 9:47:57 UTC - in response to Message 35150.  
Last modified: 3 May 2018, 9:49:41 UTC

computezrmle!
Thank You very much for so fast reply!

2. Max # of CPUs for this project - this means a limit for maximum of running tasks at the same time?

No.
It limits the number of CPU cores used by a multicore app.


Dear Project team!

As You can see, current definition may provoke misunderstandings.
Can You correct it (for example, Max # or CPUs for multicore applications)?
Or just add some explanation, what this statement means?

Also, some of LHC vbox applications are a significant abuse for "typical / standard" volunteer's PC used at home or at work.
Is it possible to add one more statement, which would allow to limit maximum of LHC tasks running at the same time?
Of course, this is not necessary at all for SixTrack...
But it necessary already for LHCb, which eats 2 GB of RAM pro task...
ID: 35152 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1982
Credit: 142,803,277
RAC: 100,337
Message 35153 - Posted: 3 May 2018, 10:09:44 UTC - in response to Message 35152.  

... Is it possible to add one more statement, which would allow to limit maximum of LHC tasks running at the same time? ...

To limit the number of concurrently running instances of a distinct app you may use a local app_config.xml file.
See the BOINC documentation for a general overview:
http://boinc.berkeley.edu/wiki/client_configuration#Application_configuration

A simple example could look like this:

<app_config>
    <app>
      <name>Theory</name>
      <max_concurrent>2</max_concurrent>
    </app>
    <app>
      <name>LHCb</name>
      <max_concurrent>1</max_concurrent>
    </app>
    <project_max_concurrent>2</project_max_concurrent>
</app_config>
ID: 35153 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,849,068
RAC: 994
Message 35154 - Posted: 3 May 2018, 11:39:01 UTC - in response to Message 35153.  

computezrmle
Thank You very much again!
Yes, of course - XML always is a solution.
<demagogy>
But for advanced users only - at least ONE incorrect symbol in XML code is resulting from "this is just not working" to "this ruined all". Some time ago I had many hours of pain until I finally activated optimized apps for SETI and Einstein...
Also which percent of volunteers may be named as "advanced"?
</demagogy>
In THIS situation:
1. YOUR code is not working.
2. MY code is not working too:
<app_config>
    <app>
      <name>Theory Simulation</name>
      <max_concurrent>3</max_concurrent>
    </app>
    <app>
      <name>LHCb Simulation</name>
      <max_concurrent>1</max_concurrent>
    </app>
	<report_results_immediately/>
	</app_config>

Maybe, <name> is still incorrect?
Any ideas?
ID: 35154 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1982
Credit: 142,803,277
RAC: 100,337
Message 35155 - Posted: 3 May 2018, 12:41:28 UTC - in response to Message 35154.  

Yes, of course - XML always is a solution.

No, not always ;-)


<demagogy> But for advanced users only ... </demagogy>

<demagogy>
A volunteer who
- contributes to the project for more than 10 years
- collected more than 6 mio credit points
- posted roughly 100 comments
- changed his local setup to run non standard apps for different projects
- ...

is NOT a novice.

At least some knowledge can be expected regarding:
- how to correctly code/copy a very simple xml file
- where to place that file
- what to do afterwards, e.g. "reload config files" or "restart BOINC client" or "reboot the computer"
</demagogy>


<switch_back_to_normal_mode/>

In THIS situation:
1. YOUR code is not working.
2. MY code is not working too:
<app_config>
    <app>
      <name>Theory Simulation</name>
      <max_concurrent>3</max_concurrent>
    </app>
    <app>
      <name>LHCb Simulation</name>
      <max_concurrent>1</max_concurrent>
    </app>
	<report_results_immediately/>
	</app_config>

Maybe, <name> is still incorrect?
Any ideas?

wrong: <name>Theory Simulation</name>
wrong: <name>LHCb Simulation</name>

correct: <name>Theory</name>
correct: <name>LHCb</name>

Hint1: examine your client_state.xml
Hint2: Only a novice may save client_state.xml while the client is running.


It doesn't work?
Did you "reload config files" (should be enough in this case) or "restart BOINC client" or "reboot the computer"?
Did you examine your BOINC client log?
ID: 35155 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,849,068
RAC: 994
Message 35157 - Posted: 3 May 2018, 15:54:21 UTC - in response to Message 35155.  
Last modified: 3 May 2018, 15:56:39 UTC

...is NOT a novice.

All of these years my hosts processed SixTrack only - hope, this explains all. ;-)
Hint1: examine your client_state.xml

Currently no idea, which statement is missing or incorrect there (between <project> </project>).
Hint2: Only a novice may save client_state.xml while the client is running.

Excellent hint. :oD
It doesn't work?

Not yet, unfortunately.
Did you examine your BOINC client log?

??? cc_config.xml not found - using defaults ???
This file is really missing.
ID: 35157 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1982
Credit: 142,803,277
RAC: 100,337
Message 35158 - Posted: 3 May 2018, 16:36:31 UTC - in response to Message 35157.  

"Hint1" was given in relation to the correct application name as explained in the BOINC documentation.


The app_config.xml has to be stored in the folder:
...\base_of_your_BOINC_client\projects\lhcathome.cern.ch_lhcathome\

If you do a "reload config files", your client log must show a line similar to this:
Do 03 Mai 2018 18:23:06 CEST | LHC@home | Found app_config.xml

options like "<max_concurrent>" become active immediately after the reload.
This can be checked (counted) in the BOINC manager's task list.
ID: 35158 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 3 Oct 06
Posts: 101
Credit: 8,849,068
RAC: 994
Message 35159 - Posted: 3 May 2018, 17:05:30 UTC - in response to Message 35158.  

I just found - I have TWO BOINC_Data folders.
I was not accurate, when I upgraded BOINC several days ago.
Thank You very much for Your patience!
And GOOD LUCK!
ID: 35159 · Report as offensive     Reply Quote

Message boards : Number crunching : Boinc memory estimate and LHC Settings


©2022 CERN