Message boards :
ATLAS application :
Delete old settings ??
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Was running 3 * 8 threads on this machine changed preferences on my account to run unlimited Let machine finish work Removed app_config added more ram checked Virtual still OK in bios detached A few days later I thought I would give this another go attached to project Am still getting "running 8 cpu's" Removed then reinstalled VB 5.1 plus tools reset project No change. Save me some pain please.... How do I start with a clean sheet ?? (mint 18.3, boinc 7.10.2, 12c/24t, 64GB) |
Send message Joined: 2 May 07 Posts: 2241 Credit: 173,895,024 RAC: 2,804 |
What's about to go back to boinc.berkeley.edu Virtualbox Version 5.1.26 including extension Pack with Boinc 7.8.3? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 51,175 |
Was running 3 * 8 threads on this machine Your account lists several hosts. Which one is "this"? Guess: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10544285 changed preferences on my account to run unlimited Unlimited #tasks or #cores? added more ram To the host I guess (otherwise you would have kept your app_config.xml). detached Did you assign the host to the same venue you changed above? Removed then reinstalled VB 5.1 plus tools Do you mean VirtualBox extensions? From a typical log: https://lhcathome.cern.ch/lhcathome/result.php?resultid=191143249 I guess it's a VirtualBox problem. You may: 1. remove VirtualBox 2. Reinstall it 3. Reinstall the VBox extensions 4. Check if your BOINC user is in the vboxusers group 5. Reboot 6. Request fresh LHC work If the problems persist, post here again. Additional suggestions: You now have enough RAM to run less than 8 cores per VM which would be more efficient. Best would be a 2-core setup. As you run linux, you may try the native app. See here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4703&postid=35234 |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Was running 3 * 8 threads on this machine |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Your comment to try native..... I guess that trying a never before used distro is not too bad but I am interested to know what advantage this running native might offer? Faster? Less memory intensive? |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Did some reading about adding user to vboxusers and have machine running again but still with 8 cpu's It seems then that I have misunderstood the LHC@home preferences. I have now set this to: Max # jobs No limit Max # CPUs 1 so when the current wu running finishes I am hoping that I see individual cores running individual wu's even though that might take a while machine id is 10545906 |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Max jobs 2 Max cpu's 2 Gave me 2*2cpu's Max Jobs 8 Max CPU's Gave me 2*2cpu's adding app_config: <app_config> <project_max_concurrent>8</project_max_concurrent> <app> <name>ATLAS</name> <max_concurrent>8</max_concurrent> <fraction_done_exact/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>2</avg_ncpus> </app_version> </app_config> Re-read config files gives me 2*2cpu's after next wu downloads Project update and.... No Change The procedure above normally results in immediate change on other projects.... So, as per post title How to clear current settings and engage new ones |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 51,175 |
The following settings should work on your host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10545906 Web preferences #cores: 2 #tasks: 8 (or unlimited) app_config.xml <app_config> <app> <name>ATLAS</name> <max_concurrent>8</max_concurrent> <fraction_done_exact/> </app> <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 4800</cmdline> </app_version> <project_max_concurrent>8</project_max_concurrent> </app_config> Don't forget a "reload config files". The cmdline settings become active for the next starting VM. This configures more RAM per VM than requested by the project server. The reason why can be found in the following comments. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4702&postid=35261 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4702&postid=35290 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4695&postid=35268 David Cameron wrote: ... I wonder if we need to increase the memory limits for the current tasks. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4694&postid=35293 David Cameron wrote: Indeed the current tasks are rather heavier than the previous ones - on average each event takes twice or even three times as long. Some may call them "deadly", others might appreciate the extra credit :) |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
OK so I followed that guide and changed the app_config (copy/paste) then adjust to 10 concurrent tasks No new tasks abort old read config and see listed in log allow new tasks still get just 2 running 2 threads No new tasks let these run. observations: first 20% suggests runtime of ~35mins get to ~30% and total memory use hits 11.7GiB and then stays steady get to 60% in under 30 mins but time remaining is climbing get to 80% in 50:40 get to 85% in 59:43 Things to do but came back to find: 97.25% at 1:53:20 This is running much like the last wu so, maybe it is not resetting the app_config settings at all The plan now is to abort and re-boot because it seems to me that that way I can be sure boinc is seeing the settings from the beginning. Once done I shall let it run and go to bed |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
And after a reboot still only running two tasks of two |
Send message Joined: 14 Jan 10 Posts: 1417 Credit: 9,440,570 RAC: 1,020 |
OldChap wrote: This is running much like the last wu so, maybe it is not resetting the app_config settings at allDo you have a new line after the last </app_config>? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 51,175 |
@ OldChap There's good news and there's bad news. Good news Your stderr.txt, e.g from https://lhcathome.cern.ch/lhcathome/result.php?resultid=191632548, shows that your VMs accept the settings from your app_config.xml. 2018-05-21 02:13:21 (5758): Setting Memory Size for VM. (4800MB) 2018-05-21 02:13:21 (5758): Setting CPU Count for VM. (2) Bad news Regarding your VirtualBox installation 2018-05-21 02:12:59 (5758): Detected: VirtualBox VboxManage Interface (Version: 5.1.38) 2018-05-21 03:06:46 (2772): Detected: VirtualBox VboxManage Interface (Version: 5.1.34) Your log references 2 different VirtualBox versions. This should be checked/corrected. You may stay on 5.1.x for the moment and don't upgrade to 5.2.x. VBoxManage: error: VD: error VERR_FILE_NOT_FOUND opening image file '/usr/share/virtualbox/VBoxGuestAdditions.iso' (VERR_FILE_NOT_FOUND). This line shows that the VM can't access the VBox Additions. They are not a must, but helpful for monitoring. So it is recommended to install them. Be aware that they should be the same version than the Hypervisor (= main VirtualBox program). I remembered a discussion in the MB that is not relevant for my own setup (as I don't run that many ATLAS concurrently) but it may affect your's: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4686&postid=35258 To make it short: This (in my eyes misconfiguration) of the server is most likely the reason why you can't get more tasks than configured via #cores in the web preferences. If you set this to a higher value, you may get more. As the server uses the same parameter to calculate the RAM setting, the limit on your host (https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10545906) will most likely be 6 concurrently running VMs. This is caused by the fact that the RAM setting is controlled by "memory_size_mb" and "working_set_size/working_set_size_smoothed". The second parameter can't be influenced by the user but is used by your BOINC client to determine if an additional task can be started. |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Excellent Yes I have been lazy and initially downloaded the extensions using package manager only to find that the version does not match On restart and on starting Virtualbox briefly this is seen and the option to upgrade is presented Selecting "yes" downloads and installs the correct version but there must be an error in this process which does not remove the older version. I may have to either download/install the correct package initially using CLI (not sure of its name) or purge the older version A look at my running overnight still shows those wu's going VERY long for the last few percent but I will consider that again once everything else is working Thanks also for the Stderr output location info. Yes it should be on file locally but this is easier (still being lazy) |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Another night, another install. This time using Virtualbox-5.2 4 completed wu's OK but Last 3% took 4+ hours Result file showing "Setting Memory Size for VM. (4800MB)" but in BoincTasks I see 4400MB Changing the app_config to 6 tasks and changing the LHC preferences to 6 tasks then running update followed by a re-boot.... Still shows me running 2 tasks after I allow new work Nothing I have done so far is allowing me to adjust that number It is as if the app_config needs a different name to run these |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 51,175 |
Sometimes your WUs get rewarded just for the time you spend the resources although they don't deliver useful scientific output and state "Guest Log: Successfully finished the ATLAS job!". See this ones (examine the log for error messages): https://lhcathome.cern.ch/lhcathome/result.php?resultid=191619381 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191618846 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191636140 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191637424 That ones are perfect (the long runtime is normal): https://lhcathome.cern.ch/lhcathome/result.php?resultid=191669813 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191691764 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191691768 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191923454 https://lhcathome.cern.ch/lhcathome/result.php?resultid=191914088 |
Send message Joined: 7 May 17 Posts: 16 Credit: 1,456,154 RAC: 0 |
Thanks everybody for the help This is my HUUUUGE Thank You to PHILIPPE who decided to try to educate me in the ways of LHC via PM This was always going to be me being an idiot and this was in the end true. I have been totally misunderstanding how the LHC Preferences work So, with some expert help I changed the #cpus to unlimited (for now) and it downloaded 6 Wu's I have now started a second instance of Boinc and shared resources equally As a result I now have 10 threads of 2 cpus running on that machine, 5 on boinc and 5 on boinc2..... ......WHICH WAS WHAT I HAVE BEEN AIMING FOR ALL ALONG THANK YOU EVERYBODY that helped me start to understand Atlas |
©2024 CERN