Message boards : Number crunching : "Waiting for memory" - although enough RAM available
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36287 - Posted: 7 Aug 2018, 5:19:50 UTC

I guess that's again one of these strange BOINC things:

My system has 32GB RAM available, memory usage is set to 95%.

While 4 2-core ATLAS tasks are being processed (with 4800MB allocated per task via app_config.xml), I download 1 LHCb task which starts without problems.
So, at this point, 4 ATLAS and 1 LHCb tasks are running concurrently, with total RAM usage ~23GB (out of 32GB available).
After one ATLAS tasks got finished and uploaded, another ATLAS task started, and at that moment the LHCb tasks switched to "waiting for memory" status.

I then tried: in app_config.xml I set the number of concurrently running ATLAS jobs (2-core) to 5 and downloaded a 5th ATLAS task. However, this task did not start, but stays in "waiting to run" status. Although enough memory is available to run 5 2-core ATLAS tasks.

Can anyone explain to me what the problem is, and how I can solve it (I am afraid it can't be solved since it may have to to with some strange BOINC settings in the background, right?) ?
ID: 36287 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 617
Credit: 385,312,294
RAC: 135,319
Message 36288 - Posted: 7 Aug 2018, 5:29:29 UTC - in response to Message 36287.  

The memory in app config is not known by boinc, boinc uses the working set defined by the project to calculate if it has enough free ram to start another task.

You can try to tune the number of cores on the web to match the working set in the appconfig or you can choose 1 core on the web and force more cores with appconfig. With the 2nd option now you have to be careful that boinc doesn't start too many tasks and over load the ram on your computer.


Since the project by default use 4800MB, you could stop using the appconfig and go with websettings, although the 2core will use 5700MB under new rules.
ID: 36288 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36292 - Posted: 7 Aug 2018, 6:38:58 UTC - in response to Message 36288.  

You can try to tune the number of cores on the web to match the working set in the appconfig or you can choose 1 core on the web and force more cores with appconfig.
This does not work though, because when I choose 1-core for an ATLAS task, I can download only 1 ATLAS task (this is the problem which has been mentioned here a few times before, but still it exists).
ID: 36292 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36299 - Posted: 7 Aug 2018, 9:50:48 UTC - in response to Message 36292.  

after the LHCb task and one of the ATLAS tasks got uploaded, 2 ATLAS tasks were downloaded, but both immediately switched to "waiting to run" status. They did not start, but are waiting.
So, from then on only three 2-core ATLAS tasks were/are running, with about 50% CPU and about 55% RAM usage.

WHY SO ?

I'd like to run 5 ATLAS tasks, or at least 4 (as earlier this morning). Could anyone give me advice as to how to get that accomplished?
ID: 36299 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 484
Credit: 25,910,910
RAC: 14,597
Message 36300 - Posted: 7 Aug 2018, 9:53:47 UTC

You can check what the working set size for a task is by right clicking a task in Boinc Manager and select properties.
ID: 36300 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36303 - Posted: 7 Aug 2018, 10:38:03 UTC - in response to Message 36300.  

hm, here it works a little different: clicking on a given task, and then on the left hand side of the BOINC manager is a tab called "Properties" -

at any rate, the values for the 3 tasks in process are:

"required RAM" 127,83MB - 128,75MB - 127,72MB (whatever this means)
"size of working package": 8,2GB - 7,32GB - 7,32GB (I guess that is what you mean by working set size).

So the latter would explain why BOINC does not start a fourth task. On the other hand, as explained above: MemInfo shows that about 45% of my 32GB RAM are free. That's what I don't understand.
In other words: with 32GB RAM, only three 2-core tasks can be run concurrently?
ID: 36303 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 418
Credit: 102,290,183
RAC: 88,268
Message 36304 - Posted: 7 Aug 2018, 12:19:58 UTC

It is very easy to understand: BOINC has to reserve the memory with the size of Working_Set_size. And it doesn't know, that the real needed amount of RAM is much lower


Supporting BOINC, a great concept !
ID: 36304 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 484
Credit: 25,910,910
RAC: 14,597
Message 36305 - Posted: 7 Aug 2018, 12:27:22 UTC - in response to Message 36303.  

Hmm, I actually use BoicTasks instead of Boinc Manager. The guide was how it works there, I just assumed it would be similar. Anyway you found the information I was referring to.

I don't think that Boinc actually knows how much memory a VB task/application uses, it relies on the information the task's init file has. So your final conclusion seems true, you could try running two 3-core tasks instead.
ID: 36305 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36307 - Posted: 7 Aug 2018, 13:33:30 UTC - in response to Message 36304.  

... And it doesn't know, that the real needed amount of RAM is much lower
DUMM BOINC :-(((
ID: 36307 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 617
Credit: 385,312,294
RAC: 135,319
Message 36312 - Posted: 7 Aug 2018, 17:16:45 UTC
Last modified: 7 Aug 2018, 17:22:12 UTC

I my opinion it's the responsibility of the project team to set it correctly, not BOINC. Since it's configerable on a per job basis. I don't think BOINC knows for any WU not just Virtual ones. Maybe the required ram? as you said this isn't valid for VMs.

The limits on WU are also a project configuration topic too, since theory allows more than one WU if the 1 core setting is selected as you would expect.

You could run 3 six cores with 3 in the queue.
Core/WU	        WS MB	     BOINC Use	 GB         Running	 Queued
1	         3900	       3.8	             1	           0
2	         4800	       4.7           	     1	           0
3	         5700	       5.6	             3        	   0
4	         6600	       6.4	             4	           0
5	         7500	       7.3	             4	           1
6	         8400	       8.2	             3	           3
7	         9300	       9.1	             3	           4
8	        10200	      10.0	             3	           5
ID: 36312 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36315 - Posted: 7 Aug 2018, 18:44:24 UTC

What does "working set" mean? Not sure about Windows but in Linux world "working set" is not equivalent to "the RAM the app itself requires". Working set = RAM the app itself requires + RAM for shared objects (.dll's in Windows). Should BOINC base resource allocation estimates only on the RAM the app itself requires? Seems that would be recipe for disaster.

At the same time, I can see where basing it on working set (as defined above) can lead to situations where utilities indicate plenty of unused RAM available yet BOINC refuses to run more tasks. I think that happens because the shared objects are in fact shared (because they are re-entrant) which means only 1 copy needs to be in RAM despite the fact that several processes are using said shared object? Seems that would be a safer way to estimate resource requirements as it over-estmates rather than under-estimates?
ID: 36315 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36316 - Posted: 7 Aug 2018, 19:09:45 UTC - in response to Message 36312.  

I my opinion it's the responsibility of the project team to set it correctly, not BOINC.
yes, I agree !
ID: 36316 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1608
Credit: 94,642,971
RAC: 98,650
Message 36317 - Posted: 7 Aug 2018, 19:21:10 UTC - in response to Message 36315.  

You may read the BOINC documentation to get an idea how it is defined in this context:
https://boinc.berkeley.edu/trac/wiki/MemoryManagement

In addition you may examine how the different values in scheduler_request*, scheduler_reply*, client_state.xml, app_config.xml ... influence each other and the behaviour of your client.
ID: 36317 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1608
Credit: 94,642,971
RAC: 98,650
Message 36318 - Posted: 7 Aug 2018, 19:31:17 UTC - in response to Message 36316.  

I my opinion it's the responsibility of the project team to set it correctly, not BOINC.
yes, I agree !

Yes, definitely.
It comes as "rsc_memory_bound" inside the sched_reply* file.
Also see: https://boinc.berkeley.edu/trac/wiki/MemoryManagement
ID: 36318 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 484
Credit: 25,910,910
RAC: 14,597
Message 36319 - Posted: 7 Aug 2018, 20:08:57 UTC - in response to Message 36312.  

I don't think BOINC knows for any WU not just Virtual ones.

For Seti GPU tasks I see the working set size changing during the runtime of the task, so some applications do report the memory back to Boinc. In BoincTasks you can view this value all the time if you want to.
ID: 36319 · Report as offensive     Reply Quote

Message boards : Number crunching : "Waiting for memory" - although enough RAM available


©2021 CERN