Message boards : ATLAS application : Wrong memory usage for multicore
Message board moderation

To post messages, you must log in.

AuthorMessage
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29747 - Posted: 2 Apr 2017, 1:16:49 UTC

I was trying to get multicore working and it was almost there the CPU load was good but I hit the working set problems from before.

I was looking at the other projects and ATLAS.

For CMS in the file CMS_2016_03_22.xml there is <memory_size_mb>2048
Looking at the memory in BOINC, the task uses 2.33GB, and in the logs the VM is using 2048MB, there is 0.33GB added as some buffer I assume.

These numbers match up with the <rsc_memory_bound> in the init_data.xml

Taking ATLAS with appconfig of 5000MB and 2 cores, and looking at the memory in BOINC, the task is using 18.16GB!! So where does this come from?

Breaking it down it seems that the appconfig doesn't override the setting properly.

For ATLAS the <rsc_memory_bound> in the init_data.xml is 18.16GB too.

So it would seem that the memory could be calculated as so:

5000MB from the ATLAS_2017_01_09.xml (not sure why this is 5000MB?)
2x5000MB from the app_config.xml
3400MB from the normal single core task setting (for ATLAS it's unclear how the normal settings for VM are applied?)
Plus a little overage as before (its a different amount from before so I'm not sure how it's calculated?)

Anyway maybe someone expert in BOINC can work it out??

I'll go back to single cores as I get waiting for memory as BOINC thinks the WU's are using 18GB when there are using 4.xxGB
ID: 29747 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29750 - Posted: 2 Apr 2017, 6:42:50 UTC

I had the same problem. The point is (somebody told it to me), that the value of Max # CPU on the LHC-page is used to calculate the amount of memory. I set the Max # CPU to a lower value (6 instead of 24) and use a cc_config for the real number.
ID: 29750 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29756 - Posted: 2 Apr 2017, 8:06:31 UTC - in response to Message 29750.  

I had the same problem. The point is (somebody told it to me), that the value of Max # CPU on the LHC-page is used to calculate the amount of memory. I set the Max # CPU to a lower value (6 instead of 24) and use a cc_config for the real number.

Yes, this is at the moment the only working trick !


Supporting BOINC, a great concept !
ID: 29756 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,951,161
RAC: 137,233
Message 29759 - Posted: 2 Apr 2017, 8:24:00 UTC - in response to Message 29747.  

... For CMS in the file CMS_2016_03_22.xml there is <memory_size_mb>2048 ...

This value is used as default if none of the values described next apply:
1. The server sends a <cmdline>--memory_size_mb nnnn</cmdline>
2. The user defines a <cmdline>--memory_size_mb nnnn</cmdline> in app_config.xml

Settings from a higher number overwrite settings from a lower number.
This value directly controls the VM´s RAM setting.

... Looking at the memory in BOINC, the task uses 2.33GB ...

What you see here is the result of the value in <rsc_memory_bound>.
This value is sent by the server. A user can not change it.


How does it work?
Suppose you have a host wit 16 GB RAM and you allow BOINC to use 60% of it.
That makes 9.6 GB
Now your BOINC client will run up to 4 WUs (2.33 x 4 = 9.32) and beside that (third party) tasks with together less than <rsc_memory_bound>0.28</rsc_memory_bound>.

If your BOINC client has 3 CERN WUs (2.33 GB) and a (third party) task with <rsc_memory_bound>0.3</rsc_memory_bound> running or at least in memory there would be only 2.31 GB left. Although an additional VM (2.0 GB) would fit the BOINC client would not start it (9.62 > 9.6).
ID: 29759 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29761 - Posted: 2 Apr 2017, 8:47:11 UTC
Last modified: 2 Apr 2017, 8:59:28 UTC

I made the max number of CPU's no limit which gave 0 in config files, so I would imagine this has no impact or it mess up totally (anything * 0 =0)

I think the best option then would be to set # of CPU to 1 on the config page as this shouldn't mess with anything.

To me there is still some inconsistancies in how the memory is calculated give the point of the multicore is to use less RAM.
ID: 29761 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29763 - Posted: 2 Apr 2017, 9:02:21 UTC

Setting the app_config to use 4600MB with the websettings of unlimited for both. and forcing number of cores to 20 (as per my CPU)

I now get 15.04 GB <rsc_memory_bound>
ID: 29763 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29764 - Posted: 2 Apr 2017, 9:13:13 UTC - in response to Message 29761.  

I think the best option then would be to set # of CPU to 1 on the config page as this shouldn't mess with anything.

For RSC-Bound-Memory this should be the best option.

Or set it to the lowest / biggest number of cores you set up via app_config


Supporting BOINC, a great concept !
ID: 29764 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29765 - Posted: 2 Apr 2017, 9:15:59 UTC - in response to Message 29764.  
Last modified: 2 Apr 2017, 9:25:38 UTC

With the setting of 1 then RSC bound is 3.32GB, here BOINC could schedule more work than there is ram for as it has the wrong amount of RAM.

With setting of 2 it's 4.10GB, still less than set by app_config.

With setting of 3 it's 4.88GB, so this is slightly over the true amount set in the app_config but not excessive
ID: 29765 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 29766 - Posted: 2 Apr 2017, 9:28:31 UTC

I think RSC-Bound is calculated by the latest published formula of David:

2,6 GB + 0,8 * NumberOfCores


Supporting BOINC, a great concept !
ID: 29766 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,951,161
RAC: 137,233
Message 29767 - Posted: 2 Apr 2017, 9:35:21 UTC - in response to Message 29765.  

The 2.33 GB you wrote about here are for CMS?
CMS is a 1-core app so I guess the <rsc_memory_bound> is set to a fix value on the server.

Instead <rsc_memory_bound> for ATLAS should be calculated from the core setting.
I´m not sure if this calculation is error free.

Another point:
If you play around with app_config.xml some parameters may not be reset if you delete them from the file. Unfortunately a hint in the BOINC documentation disappeard on the current page?

Try at least a BOINC restart or a project reset.
ID: 29767 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29768 - Posted: 2 Apr 2017, 9:48:10 UTC

Looks like it, although from all the testing people did it was 4400MB for dual seems like it could do with a tweek.
ID: 29768 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,763,678
RAC: 232,370
Message 29769 - Posted: 2 Apr 2017, 9:53:02 UTC - in response to Message 29767.  

Yes, I took the CMS as baseline for investigation.

The unlimited setting on the web seems to mess all the calculations up, I assume div/0 problems.

I normally shutdown BOINC completely as this seems to let it read the appconfig properly.

with 3 cores set on the web this does OK calculation for RAM usage and multicore for the RSC bound.

What's really happening in the VM is fine too as this is set in the appconfig.
ID: 29769 · Report as offensive     Reply Quote

Message boards : ATLAS application : Wrong memory usage for multicore


©2024 CERN