Message boards : ATLAS application : -1073740791 (0xC0000409) STATUS_STACK_BUFFER_OVERRUN
Message board moderation

To post messages, you must log in.

AuthorMessage
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29116 - Posted: 8 Mar 2017, 20:18:21 UTC

I try to run more than one 4-core Theory-tasks and get this error.
The memory_size in app_config is set to 5400.
ID: 29116 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,876,678
RAC: 849
Message 29119 - Posted: 9 Mar 2017, 8:34:37 UTC - in response to Message 29116.  

Strange. In the stderr.txt 8400MB is reported: 2017-03-08 18:09:23 (8408): Setting Memory Size for VM. (8400MB)

Set the memory size for Theory for a 4 core VM to 1280MB. That's enough.
ID: 29119 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29120 - Posted: 9 Mar 2017, 8:44:51 UTC

I am sure, that ist not the reason; I started my tests with 1280MB. But I have 2 problems: This error and the fact, that I can not run more than 2 ATLAS-tasks at the same time. So I set in the app_config for both projekts the mem to 8400MB.

First I run ATLAS -> problem only 2 tasks
Then I followed your advice : got this error on all tasks.

Now I had uninstall VBox (14) and installed the latest (16) and I set up LHC new and make new tests.
ID: 29120 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 431
Credit: 117,525,067
RAC: 0
Message 29123 - Posted: 9 Mar 2017, 11:03:56 UTC

HM, you are mixing up Theory-App and Atlas-App

Crystal has suggested 1280 MB for a 4-Core-Theory-WU, but if yoy apply this number to Atlas-APP it will not be able to run 1 tasks


Supporting BOINC, a great concept !
ID: 29123 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29124 - Posted: 9 Mar 2017, 11:21:13 UTC

There is plenty of space . . .

<app_config>
<app_version>
<app_name>Theory</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>4.000000</avg_ncpus>
<cmdline>--memory_size_mb 1500</cmdline>
</app_version>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>4.000000</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 8400</cmdline>
</app_version>
</app_config>

For the aktual test I reduced the Theory-memory-size, three are now running - but only 2 4-core-ATLAS with 1 waiting.
ID: 29124 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,862,693
RAC: 96,991
Message 29125 - Posted: 9 Mar 2017, 13:08:17 UTC

What RAM values (in %) did you set in "Options -> Computing Preferencies ..."?
Do all of your currently running and paused jobs fit into the configured RAM?
ID: 29125 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29127 - Posted: 9 Mar 2017, 13:30:48 UTC

I have a 12/24-core CPU with 64GB RAM.
In Options RAM-usage was set to 90% max; the project can run 5 tasks max.
The actual test uses 83% of CPU and 26% of RAM with 2 ATLAS and 3 Theory tasks, 3 tasks are waiting (1 ATLAS and 2 Theory).
ID: 29127 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,862,693
RAC: 96,991
Message 29130 - Posted: 9 Mar 2017, 14:04:00 UTC

I guess you have no other job from any other project running beside ATLAS or Theory, don´t you?

Try to locate (carefully!) all occurrences of the variable <rsc_memory_bound>123456789.123456</rsc_memory_bound> in the file client_state.xml and sum up the values for jobs that are running/pausing/waiting for memory.
Does the result exceeds your configured RAM value (64 GB x 0.9 = 57.6 GB)?
ID: 29130 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29132 - Posted: 9 Mar 2017, 14:31:41 UTC

LHC ist the only project.

rsc_memory_bound for ATLAS is : 26633830400.00 for each task
Theory : 500000000.00

This is in BYTEs ?! Then 2 ATLAS tasks have over 53 GB blocked ?!

Am I right ???
ID: 29132 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29133 - Posted: 9 Mar 2017, 14:37:43 UTC

Question : Can I change theses values in the client_state.xml ??? (carefully!)
ID: 29133 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,876,678
RAC: 849
Message 29136 - Posted: 9 Mar 2017, 16:15:53 UTC - in response to Message 29132.  

LHC ist the only project.

rsc_memory_bound for ATLAS is : 26633830400.00 for each task
Theory : 500000000.00

This is in BYTEs ?! Then 2 ATLAS tasks have over 53 GB blocked ?!

Am I right ???

Yes, that's right. Each task has 24.8 GB reserved. That's why the other tasks are waiting for memory.

Question : Can I change theses values in the client_state.xml ??? (carefully!)

Yes that's possible, but first you have to shutdown the BOINC-client.

Edit with a basic text-editor the client_state.xml and change the <rsc_memory_bound>26633830400.000000</rsc_memory_bound> into <rsc_memory_bound>5662310400.000000</rsc_memory_bound> for 5400MB of memory for a 4-core ATLAS-task.

Do this for all (also not started) Workunits, save the file and start the BOINC-client.
ID: 29136 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29138 - Posted: 9 Mar 2017, 16:50:13 UTC

What ist the difference between "Setting Memory Size for VM" (8400MB)

2017-03-09 09:03:40 (5764): vboxwrapper (7.7.26196): starting
2017-03-09 09:03:40 (5764): Feature: Checkpoint interval offset (300 seconds)
2017-03-09 09:03:40 (5764): Detected: VirtualBox COM Interface (Version: 5.1.16)
2017-03-09 09:03:40 (5764): Detected: Minimum checkpoint interval (900.000000 seconds)
2017-03-09 09:03:40 (5764): Successfully copied 'init_data.xml' to the shared directory.
2017-03-09 09:03:40 (5764): Create VM. (boinc_0521eef5705f3a8d, slot#0)
2017-03-09 09:03:40 (5764): Setting Memory Size for VM. (8400MB)
2017-03-09 09:03:40 (5764): Setting CPU Count for VM. (4)
2017-03-09 09:03:40 (5764): Setting Chipset Options for VM.

and "rsc_memory_bound" in client_state.xml ?

Is the number of CPUs set on the LHC-page used to calculate these values ? and: is this number of CPUs the phys. or the log. CPUs ? perhaps I should set the phys. number . . .

and : I will try a changing . . .
ID: 29138 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,862,693
RAC: 96,991
Message 29139 - Posted: 9 Mar 2017, 16:53:29 UTC

... rsc_memory_bound for ATLAS is : 26633830400.00 for each task ...
... This is in BYTEs ?! Then 2 ATLAS tasks have over 53 GB blocked ?! ...


Right. If this is not a typo: 24.8 GB per task.


... Question : Can I change theses values in the client_state.xml ??? (carefully!) ...


Not recommended but possible as CP wrote (be VERY CAREFUL!!!)


The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit.
It´s in the responsibility of the project developers to fill it with a reasonable value.



My proposal:

The value for <rsc_memory_bound> should be derived from the <avg_ncpus> tag.
Thus the user can set the number of cpus to be used inside an app_config.xml and the project can calculate the necessary amount of RAM for n cpus.
ID: 29139 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,876,678
RAC: 849
Message 29141 - Posted: 9 Mar 2017, 17:13:41 UTC - in response to Message 29139.  

The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit.
It´s in the responsibility of the project developers to fill it with a reasonable value.

I suppose the project has dedicated the rsc_memory_bound to the given plan_class vbox64_mt_mcore_atlas and not obeying the setting from app_config.xml.
ID: 29141 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,862,693
RAC: 96,991
Message 29142 - Posted: 9 Mar 2017, 17:23:43 UTC - in response to Message 29138.  

... 2017-03-09 09:03:40 (5764): Setting Memory Size for VM. (8400MB)


Your Virtual Machine is configured with 8400MB RAM.

Your workunits consist of more than the VM, e.g. the vboxwrapper.
With <rsc_memory_bound> the boinc client gets an information about the RAM usage of the complete WU and can calculate if other projects can be run in parallel.


... Is the number of CPUs set on the LHC-page used to calculate these values ? ...

Should be but IMHO this is not implemented as it should be.

... is this number of CPUs the phys. or the log. CPUs ? perhaps I should set the phys. number . . .

logical.
You shouldn´t care about this too much for the moment.
ID: 29142 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,862,693
RAC: 96,991
Message 29143 - Posted: 9 Mar 2017, 17:30:28 UTC - in response to Message 29141.  

The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit.
It´s in the responsibility of the project developers to fill it with a reasonable value.

I suppose the project has dedicated the rsc_memory_bound to the given plan_class vbox64_mt_mcore_atlas and not obeying the setting from app_config.xml.

And that´s exactly the pitfall as it only works if you run one (multicore) job from one project.
Here we need more flexibility.
ID: 29143 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29144 - Posted: 9 Mar 2017, 17:46:39 UTC

!!! Thanks for all the informations - now I can continue with special testing !!!
ID: 29144 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 431
Credit: 117,525,067
RAC: 0
Message 29156 - Posted: 10 Mar 2017, 9:26:45 UTC

May I make a guess ?

Your setting for LHC@Home is something like use all cores or use max cores or so. This is a high figure with 12 / 24 cores

Limit it to 4 cores and this value will decrease with new downloaded WUs


Supporting BOINC, a great concept !
ID: 29156 · Report as offensive     Reply Quote
peterfilla

Send message
Joined: 2 Jan 11
Posts: 23
Credit: 5,986,899
RAC: 0
Message 29169 - Posted: 10 Mar 2017, 13:14:40 UTC

The above error is "solved" (?) : I detached from LHC and reatached - and the new tasks finished OK (I do not know the reason why).

My second error (not more than 2 ATLAS-tasks) will be "solved" too : I set the number of CPUs on the server to a lower value, so the calculation of "rsc_memory_bound" gives a lower value (this test is running today).

At the moment I am running my CPU with 4-core-tasks at the limit for test - but later I will run 60 - 75 % load (4-core is a good number I heard).
ID: 29169 · Report as offensive     Reply Quote

Message boards : ATLAS application : -1073740791 (0xC0000409) STATUS_STACK_BUFFER_OVERRUN


©2022 CERN