Message boards :
Number crunching :
Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,168,157 RAC: 27,866 |
It is not so easy to run CERN-VirtualBox-Tasks on BOINC. You have to work out a good balance on your machine(s) between your Projects This checklist is the intention to help and was first developed for Atlas. But meanwhile you can also use this checklist for the other VM-Projects of LHC@Home, but Memory-Usage and Hardcopys are different. As BOINC doesn't allow us to keep the original-checklist up to date, we have to make a new thread from time to time. This Version is actualized with all new informations / hints we got since the first checklist was made. This checklist was last updated at 06.06.2017 Because of these Checklist-Updates it may be that the numbering may change / has already changed. To be sure that you point / get pointed to the correct detail I suggest to set the Version-Number of the Checklist in Front. So V3.P5 is Checklist 3 (this one here) Point 5 Please, check this list and be sure to check really all Details, step by step, all are important.
If you are having trouble with Atlas-WUs, it is a good idea to run Atlas-Only for a limited time, until you are sure, all works fine as it should. With MultiCore-WUs after startup-sequence (Point Nr 12 / V3.P12) CPU-Time should climb much faster than elapsed-time. So with a 5-Core-WU 01:00:00 hour elapsed time and 04:50:00 hours CPU-Time is okay Note: Actual one Atlas-WU contains 100 Jobs to be done. From Time to Time the project-team changes the number of Jobs based on their needs, so Runtime my vary and you should take a look around how many Jobs are actual in your WU(s)
..... Mark the running AtlasJob in BOINC-Manager ..... Choose "Show VM Console" in the left side. ..... A console should open showing following lines (with Atlas 1.44) If your Console looks like this, all seems fine and your WU should finish succesfull soon Meanwhile you can see more details within the console. Put your mouse over the console-windows, klick into the window and then press ALT/F2. Then you should see some output from your running tasks:
Supporting BOINC, a great concept ! |
Send message Joined: 19 Mar 17 Posts: 1 Credit: 56,625 RAC: 0 |
Yeti, Thank you so much! I followed the steps as directed, and was quickly able to find what wasn't working for me. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,168,157 RAC: 27,866 |
|
Send message Joined: 23 Feb 09 Posts: 3 Credit: 2,998,777 RAC: 0 |
@Yeti, one thing to add. If your windows machine has docker installed it will break virtual box. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,168,157 RAC: 27,866 |
|
Send message Joined: 27 Sep 08 Posts: 847 Credit: 691,257,714 RAC: 104,978 |
I never install the extension pack on my computers and it works fine, I agree it's useful for understanding issues if there is some |
Send message Joined: 18 Mar 17 Posts: 1 Credit: 3,897,343 RAC: 0 |
Yeti,Thank you very much for the detailed instruction. |
Send message Joined: 25 Mar 17 Posts: 1 Credit: 76,617 RAC: 0 |
Awesome instructions Yeti. Definitely got me up and running. One suggestion though, in step 6, I think it's important to specifically mention Computing Preferences and the memory and disk limitations configured there. These tripped me up on my dedicated folder with 32GB of RAM. The 50% limits there prevented multiple WU's and caused seemingly random behavior, when BOINC was really enforcing those limits. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,168,157 RAC: 27,866 |
One suggestion though, in step 6, I think it's important to specifically mention Computing Preferences and the memory and disk limitations configured there. These tripped me up on my dedicated folder with 32GB of RAM. The 50% limits there prevented multiple WU's and caused seemingly random behavior, when BOINC was really enforcing those limits. Unfortunately I can't edit the checklist V3, so I have to wait until it is time for V3.5 (or V4, who knows ?). But your comments will make their way into the next Version. Thanks, Yeti Supporting BOINC, a great concept ! |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,168,157 RAC: 27,866 |
|
Send message Joined: 22 Jun 16 Posts: 4 Credit: 986,111 RAC: 0 |
I'm trying to run Atlas on a Proliant DL580 G7 with quad E7-4870's. HT is enabled and I'm running 10 concurrent tasks. Everything seems fine. I've verified everything with the checklist. The problem is that tasks are taking 12+ hours to complete. Is this normal? Or is there a scaling issue I'm not aware of? Any advice would be appreciated. Thanks! PS: I am only running 128gb RAM. But RAM usage never exceeds 65%. |
Send message Joined: 18 Dec 15 Posts: 1811 Credit: 118,345,618 RAC: 25,724 |
The problem is that tasks are taking 12+ hours to complete. Is this normal? With a 2.4GHz processor (I guess you did NOT overclock?), this seems to be the "normal" crunching time for the current ATLAS tasks. |
Send message Joined: 22 Jun 16 Posts: 4 Credit: 986,111 RAC: 0 |
One thing I didn't mention is that CPU usage never goes over 55%. I'm running Linux Mint 18.1 Cinnamon (standard desktop). On my 2P Windows 7 machines CPU usage is normally at 90% plus. I've also got RAM in eight cartridges. I'm wondering if dropping back to "single channel" four cartridges might help. Can also add 64gb additional RAM. |
Send message Joined: 18 Dec 15 Posts: 1811 Credit: 118,345,618 RAC: 25,724 |
One thing I didn't mention is that CPU usage never goes over 55%. This is logical if with 20 CPU threads available, you run (only) 10 tasks. You could easily run more than 10 tasks. |
Send message Joined: 22 Jun 16 Posts: 4 Credit: 986,111 RAC: 0 |
I'm assuming I need to run an app_config to run more than 10. Any idea what that might be? |
Send message Joined: 18 Dec 15 Posts: 1811 Credit: 118,345,618 RAC: 25,724 |
I'm assuming I need to run an app_config to run more than 10. Any idea what that might be? no app_config needed. You can set the number of tasks on your settings page (on the Homepage). Up to 24 tasks. Further, if you'd like to save some RAM, you could set multicore tasks, which, in total, need less RAM than single cores. Also the crunching time would decrease, of course (so a 2-core task would need about 6 hours, a 3-core task about 4 hours, ...) |
Send message Joined: 22 Jun 16 Posts: 4 Credit: 986,111 RAC: 0 |
I'm assuming I need to run an app_config to run more than 10. Any idea what that might be? I'm afraid changing the site settings will mess up my 2P rigs. It took me forever to get them to run correctly as it is. Which tasks are multi-core? It's apparent I need to learn a whole lot more about this. |
Send message Joined: 14 Jan 10 Posts: 1417 Credit: 9,441,051 RAC: 798 |
Which tasks are multi-core? It's apparent I need to learn a whole lot more about this. All vbox-tasks (CMS, LHCb, Theory and ATLAS) can run multi-core, but only ATLAS will use the cores for 1 single job - will shorten the task. The other three sub-projects will load jobs for every core, so when you set Max # of CPUs to 4 the created VM will do 4 jobs within the VM. Towards the end of the task the jobs will end one after another - not at the same time. The VM will have idle cores until the last job has finished. With ATLAS the single job will use all defined cores and the task will run faster. Be aware that the credits for multi core are much lower than for single core due to BOINC's credit mechanism. |
Send message Joined: 18 Dec 15 Posts: 1811 Credit: 118,345,618 RAC: 25,724 |
In other words - presently, it makes sense only to run ATLAS on multi-core. |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,452 RAC: 1,957 |
In other words - presently, it makes sense only to run ATLAS on multi-core. We can run CMS multi-core in -dev, but in my experience you lose efficiency with more than two jobs, because of the jobs ending at different times as mentioned above, but also because they have a staggered start, in pairs, so as not to overload the system (disk, network). When I was looking at it the staging was at twenty-minute intervals; I don't know if this has been changed lately. |
©2024 CERN