Message boards :
ATLAS application :
-1073740791 (0xC0000409) STATUS_STACK_BUFFER_OVERRUN
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
I try to run more than one 4-core Theory-tasks and get this error. The memory_size in app_config is set to 5400. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
Strange. In the stderr.txt 8400MB is reported: 2017-03-08 18:09:23 (8408): Setting Memory Size for VM. (8400MB) Set the memory size for Theory for a 4 core VM to 1280MB. That's enough. |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
I am sure, that ist not the reason; I started my tests with 1280MB. But I have 2 problems: This error and the fact, that I can not run more than 2 ATLAS-tasks at the same time. So I set in the app_config for both projekts the mem to 8400MB. First I run ATLAS -> problem only 2 tasks Then I followed your advice : got this error on all tasks. Now I had uninstall VBox (14) and installed the latest (16) and I set up LHC new and make new tests. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 9,173 |
|
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
There is plenty of space . . . <app_config> <app_version> <app_name>Theory</app_name> <plan_class>vbox64</plan_class> <avg_ncpus>4.000000</avg_ncpus> <cmdline>--memory_size_mb 1500</cmdline> </app_version> <app_version> <app_name>ATLAS</app_name> <avg_ncpus>4.000000</avg_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 8400</cmdline> </app_version> </app_config> For the aktual test I reduced the Theory-memory-size, three are now running - but only 2 4-core-ATLAS with 1 waiting. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,529,990 RAC: 131,649 |
What RAM values (in %) did you set in "Options -> Computing Preferencies ..."? Do all of your currently running and paused jobs fit into the configured RAM? |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
I have a 12/24-core CPU with 64GB RAM. In Options RAM-usage was set to 90% max; the project can run 5 tasks max. The actual test uses 83% of CPU and 26% of RAM with 2 ATLAS and 3 Theory tasks, 3 tasks are waiting (1 ATLAS and 2 Theory). |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,529,990 RAC: 131,649 |
I guess you have no other job from any other project running beside ATLAS or Theory, don´t you? Try to locate (carefully!) all occurrences of the variable <rsc_memory_bound>123456789.123456</rsc_memory_bound> in the file client_state.xml and sum up the values for jobs that are running/pausing/waiting for memory. Does the result exceeds your configured RAM value (64 GB x 0.9 = 57.6 GB)? |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
LHC ist the only project. rsc_memory_bound for ATLAS is : 26633830400.00 for each task Theory : 500000000.00 This is in BYTEs ?! Then 2 ATLAS tasks have over 53 GB blocked ?! Am I right ??? |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
Question : Can I change theses values in the client_state.xml ??? (carefully!) |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
LHC ist the only project. Yes, that's right. Each task has 24.8 GB reserved. That's why the other tasks are waiting for memory. Question : Can I change theses values in the client_state.xml ??? (carefully!) Yes that's possible, but first you have to shutdown the BOINC-client. Edit with a basic text-editor the client_state.xml and change the <rsc_memory_bound>26633830400.000000</rsc_memory_bound> into <rsc_memory_bound>5662310400.000000</rsc_memory_bound> for 5400MB of memory for a 4-core ATLAS-task. Do this for all (also not started) Workunits, save the file and start the BOINC-client. |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
What ist the difference between "Setting Memory Size for VM" (8400MB) 2017-03-09 09:03:40 (5764): vboxwrapper (7.7.26196): starting 2017-03-09 09:03:40 (5764): Feature: Checkpoint interval offset (300 seconds) 2017-03-09 09:03:40 (5764): Detected: VirtualBox COM Interface (Version: 5.1.16) 2017-03-09 09:03:40 (5764): Detected: Minimum checkpoint interval (900.000000 seconds) 2017-03-09 09:03:40 (5764): Successfully copied 'init_data.xml' to the shared directory. 2017-03-09 09:03:40 (5764): Create VM. (boinc_0521eef5705f3a8d, slot#0) 2017-03-09 09:03:40 (5764): Setting Memory Size for VM. (8400MB) 2017-03-09 09:03:40 (5764): Setting CPU Count for VM. (4) 2017-03-09 09:03:40 (5764): Setting Chipset Options for VM. and "rsc_memory_bound" in client_state.xml ? Is the number of CPUs set on the LHC-page used to calculate these values ? and: is this number of CPUs the phys. or the log. CPUs ? perhaps I should set the phys. number . . . and : I will try a changing . . . |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,529,990 RAC: 131,649 |
... rsc_memory_bound for ATLAS is : 26633830400.00 for each task ... Right. If this is not a typo: 24.8 GB per task. ... Question : Can I change theses values in the client_state.xml ??? (carefully!) ... Not recommended but possible as CP wrote (be VERY CAREFUL!!!) The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit. It´s in the responsibility of the project developers to fill it with a reasonable value. My proposal: The value for <rsc_memory_bound> should be derived from the <avg_ncpus> tag. Thus the user can set the number of cpus to be used inside an app_config.xml and the project can calculate the necessary amount of RAM for n cpus. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit. I suppose the project has dedicated the rsc_memory_bound to the given plan_class vbox64_mt_mcore_atlas and not obeying the setting from app_config.xml. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,529,990 RAC: 131,649 |
... 2017-03-09 09:03:40 (5764): Setting Memory Size for VM. (8400MB) Your Virtual Machine is configured with 8400MB RAM. Your workunits consist of more than the VM, e.g. the vboxwrapper. With <rsc_memory_bound> the boinc client gets an information about the RAM usage of the complete WU and can calculate if other projects can be run in parallel. ... Is the number of CPUs set on the LHC-page used to calculate these values ? ... Should be but IMHO this is not implemented as it should be. ... is this number of CPUs the phys. or the log. CPUs ? perhaps I should set the phys. number . . . logical. You shouldn´t care about this too much for the moment. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,529,990 RAC: 131,649 |
The <rsc_memory_bound> tag is included in the server reply when you receive a new workunit. And that´s exactly the pitfall as it only works if you run one (multicore) job from one project. Here we need more flexibility. |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
!!! Thanks for all the informations - now I can continue with special testing !!! |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 9,173 |
|
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
The above error is "solved" (?) : I detached from LHC and reatached - and the new tasks finished OK (I do not know the reason why). My second error (not more than 2 ATLAS-tasks) will be "solved" too : I set the number of CPUs on the server to a lower value, so the calculation of "rsc_memory_bound" gives a lower value (this test is running today). At the moment I am running my CPU with 4-core-tasks at the limit for test - but later I will run 60 - 75 % load (4-core is a good number I heard). |
©2024 CERN