Message boards :
ATLAS application :
issues with app config/running multiple tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Evening all, Last few days Bionic has only allowed me to run a couple of atlas tasks at once rather than the max set of 6 but normally 4 due to ram... I have everything set to use 100% (cpu and ram) within Bionic, checked settings on the lhc side of things too that's all at max, jobs set to no limit, cpus I have tried from no limit to 24, now at 24 and its only allowing one task. 24 cores and 32g ram app config: <?xml version="1.0"?> -<app_config> -<app> <name>ATLAS</name> <max_concurrent>6</max_concurrent> </app> -<app_version> <app_name>ATLAS</app_name> <avg_ncpus>2.000000</avg_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 4800</cmdline> </app_version> </app_config> |
Send message Joined: 14 Jan 10 Posts: 1173 Credit: 7,350,818 RAC: 12,122 ![]() ![]() ![]() |
Set in your preferences the # of CPU's also to 2 when you have 2 in your app_config.xml |
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,189,167 RAC: 186,770 ![]() ![]() ![]() |
Your hosts are hidden. Expert users can´t check your logs. You may change your preferences and make your hosts visible. Your app_config.xml looks strange. Is it due to the copy/paste or are there really lines like: <?xml version="1.0"?> Your setting <avg_ncpus>2.000000</avg_ncpus> overrules the website preference "Max # of CPUs = 24" except the server´s working set size calculation which is now 9000MB per WU. Reduce the website preferences to not more than the value that you use in your app_config.xml. A 24 core host would be able to run 3 8-core WUs (3x9000MB = 27000MB). If you configure 4-core WUs 5800MB would be required per WU. This would use 20 CPUs. |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Set in your preferences the # of CPU's also to 2 when you have 2 in your app_config.xml Now done :) thanks Your hosts are hidden. I will allow computers to show now :) it maybe due to the copy and paste... hmmm, this is via edit: <app_config> <app> <name>ATLAS</name> <max_concurrent>6</max_concurrent> </app> <app_version> <app_name>ATLAS</app_name> <avg_ncpus>2.000000</avg_ncpus> <plan_class>vbox64_mt_mcore_atlas</plan_class> <cmdline>--memory_size_mb 4800</cmdline> </app_version> </app_config> What do you advise I do? I was told 2 cores per workunit?!?! and using the config setting above. |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Deleted app data file and still no change... closed and opened bionic etc |
Send message Joined: 14 Jan 10 Posts: 1173 Credit: 7,350,818 RAC: 12,122 ![]() ![]() ![]() |
Deleted app data file and still no change... closed and opened bionic etc I suppose you still have tasks in queue, you already got before your changes. |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Deleted app data file and still no change... closed and opened bionic etc I did delete them, however I left the ones that where running to run, back in work this morning and still only one running and the others saying waiting for memory |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
So, just finished 1 task and deleted 4... removed the app config file and just downloaded 2 WU's, both now running for 1 minute, before it would do seconds then stop... without jumping to conclusions it must be an app data file error?!?!?! |
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,189,167 RAC: 186,770 ![]() ![]() ![]() |
Have you ever worked through Yeti´s checklist? Fine. Beside that you may restart the project with conservative settings. 1. Let your local WU cache get empty 2. Reset the project in BOINC 3. Update your VirtualBox software to the most recent version 4. Reboot your host 5. Set "Max # jobs = 1" and "Max # CPUs = 1" on the LHC website 6. Create the following app_config.xml <app_config> <app> <name>ATLAS</name> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>1.0</avg_ncpus> <cmdline>--memory_size_mb 5000</cmdline> </app_version> <project_max_concurrent>1</project_max_concurrent> </app_config> 7. Request a new WU from the project 8. Reload your configuration (must be done after you got the first WU and before the WU starts) 9. Check the result before you change your settings and request a new WU |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Have you ever worked through Yeti´s checklist? I did go through his checklist last night, it was his check list that made me check preferences within lhc computing preferences :) I have set it to not allow more tasks, will complete these 2 task... follow your list and then post back. thanks :) |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
LHC@home: Notice from BOINC Your app_config.xml file refers to an unknown application 'ATLAS'. Known applications: None 27/04/2017 3:59:09 PM Had this come up, however its gone now... Will run this task through, pause everything and post again. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 67,390,810 RAC: 171,195 ![]() ![]() ![]() |
Your app_config.xml file refers to an unknown application 'ATLAS'. ... well, BOINC shows this notice only once, when you go to "Options" - "read config files". When you repeat doing this, and the notice shows up again, then something is going wrong. |
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,189,167 RAC: 186,770 ![]() ![]() ![]() |
This happens after every project reset until ATLAS (in this case) is known to your host through the first server response. Nothing to worry about if you managed to load the app_config.xml before BOINC started the WU. See number 8 of my list. You may check the stderr.txt in the slots dir of the running WU. If "Setting Memory Size for VM. (xxxxMB)" corresponds to your app_config.xml everything is fine. |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
This happens after every project reset until ATLAS (in this case) is known to your host through the first server response. All sorted now :) EDIT: found the stderr file- 2017-04-27 16:11:08 (16424): Setting Memory Size for VM. (5000MB) The WU is 49% complete, if I can sort out what app config file to run from now on I will try it, change witch ever settings you guys recommended within lhc and see what happens :) |
![]() Send message Joined: 15 Jun 08 Posts: 2184 Credit: 186,189,167 RAC: 186,770 ![]() ![]() ![]() |
Some suggestions for possible next steps. 1. Check the logfile After your WU is reported check the result on the LHC webserver (it includes a copy of your stderr.txt). - The WU should be marked as "successful" - the logfile should include lines like Guest Log: <metadata att_name="fsize" att_value="54070367"/> If this is successful, go to step 2 2. Try 1 multicore WU Leave "Max # jobs = 1", set "Max # CPUs = 2", set <avg_ncpus>2.0</avg_ncpus> and "read config files" in your client OR Leave "Max # jobs = 1", set "Max # CPUs = 4", set <avg_ncpus>4.0</avg_ncpus>, set <cmdline>--memory_size_mb 6000</cmdline> and "read config files" in your client If this is successful, go to step 3 3. Try several multicore WUs concurrently Increase "Max # jobs" step by step either with "Max # CPUs = 2" or "Max # CPUs = 4" and set your app_config.xml accordingly. <max_concurrent>x <avg_ncpus>y <cmdline>--memory_size_mb zzzz <project_max_concurrent>x Don´t forget the "read config files" before the next WU download. Always check the logfiles before you go from one step to the next. At a certain point your host will start to produce errors because of - faulty WUs -> check the message boards - other projects also need resources - a saturated internet connection -> how fast is it? - a saturated disk IO -> a lot of users don´t check/believe this point - not enough RAM -> test another combination of #WUs / cores per WU / RAM per WU - not enough CPUs -> unlikely in your case :-) |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Some suggestions for possible next steps. Thanks very much :) its got around an hour to go so will be a tomorrow job I would guess. Will edit the app config file to the changes you said and then go from there via the steps :) Thanks |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Quick update, once this WU has finished will start step 3 and report back but so far, so good :) |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
Ok, So far, so good! Changed <max_concurrent> and <project_max_concurrent> to 4 as with 5400 on the ram and 2 cores that's the most I can do and it gives me a little room too! Is there a tried and tested "thing" of x cores and x ram? I was always told 2 cores and 4800 ram... In answer to your questions computezrmle: - faulty WUs -> check the message boards - other projects also need resources - a saturated internet connection -> how fast is it? - a saturated disk IO -> a lot of users don´t check/believe this point - not enough RAM -> test another combination of #WUs / cores per WU / RAM per WU - not enough CPUs -> unlikely in your case :-) 1, I have been, there is a few around, 27th I, like many had problem WU's 2, at the moment I only have one gpu slot running so no risks there! 3, my connection isn't great, 1.5mb down and around 0.1mb up 4, I'm not sure exactly what that is so will google it, drive isn't very old, Samsung evo 850 500g 5, ram is my issue I can go to 48g in total I think... currently 32g fitted 6, cores are not an issue currently, I do however need more ram to support those cores :( |
Send message Joined: 27 Sep 08 Posts: 752 Credit: 571,328,089 RAC: 119,130 ![]() ![]() ![]() |
On my PC with 12 cores 24 threads, I can max out 64GB if there is too many ATLAS tasks. I've seen it very high on my 10 core 20 thread machine too. My other PC's with more ram I haven't seen so many concurrent ATLAS task. I have the number of task set to 10 concurrent for 64GB to see if that is a bit better as 12 made the maxed one slow. |
Send message Joined: 23 Dec 16 Posts: 26 Credit: 776,007 RAC: 0 ![]() ![]() |
so back to problems again... can not run more than 2 altas tasks now, and only 3 sizetrack tasks running... plenty cores free and sizetrack isn't bother about ram... |
©2023 CERN