Message boards :
ATLAS application :
ATLAS 8CPU Simulation Run Time
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Jan 18 Posts: 2 Credit: 9,132,013 RAC: 0 |
Hello, I am running a couple of servers with dual E5-2680v2 processors, which have a total of 40 logical cores with hyperthreading and its taking the 8CPU ATLAS simulations about 2 days to complete. They seem to complete successfully from what i can tell, but it looks like in the first 20 hours or so it will complete 90% of the job and then it takes another day to complete the remaining 10% or so. The CPU usage always seems to be pretty low (under 20% of total CPU with 5 Atlas 8CPU jobs running) and i don't see any other bottleneck on the server. I guess my questions are- 1) is this normal and just how the project runs? 2) is there something i can look at to determine a bottleneck? 3) is there any type of tweaking or over-provisioning so i can allow other project to utilize the CPU since its low? I really like contributing to this project and appreciate any input or suggestions! |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,966,224 RAC: 136,643 |
You may allow other users to read your computer's error logs. Activate this in Project -> Account -> Preferences for this project LHC@home preferences -> Should LHC@home show your computers on its web site? |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
I guess my questions are- 1) is this normal and just how the project runs? 2) is there something i can look at to determine a bottleneck? 3) is there any type of tweaking or over-provisioning so i can allow other project to utilize the CPU since its low? I would allow to run on that server 8 ATLAS VM's running 'only' with 4 CPU's each. You may change that in your project preferences. Set Max # of jobs for this project to 8 and Max # of CPUs for this project to 4. You still would have 8 cores left for other projects. |
Send message Joined: 22 Jan 18 Posts: 2 Credit: 9,132,013 RAC: 0 |
Thank you! I have updated my preferences to show the computers, any input would be appreciated! Also, I have modified my preferences to 4 CPUs and the servers have already queued up a couple 4CPU ATLAS job. We will see how the 4CPU job runs over the next few days! Thanks again for the suggestions! |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,409,041 RAC: 102,439 |
... and the servers have already queued up a couple 4CPU ATLAS job.you were lucky :-) I got just 2 tasks, and from then on it says "no tasks available for ATLAS simulation" :-( |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Thank you! I have updated my preferences to show the computers, any input would be appreciated! Thanks for un-hiding your machines. Now I can see that you have an old version (5.1.26) of VirtualBox on your 40-core servers installed. Less problems to expect with the newest version of VirtualBox -> https://www.virtualbox.org/wiki/Downloads When you also install the VirtualBox 5.2.8 Oracle VM VirtualBox Extension Pack, you are able to watch the VM Console via BOINC Manager and see how the events are processed. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,369,412 RAC: 10,065 |
Now I can see that you have an old version (5.1.26) of VirtualBox on your 40-core servers installed. Not shure if VirtualBox 5.2.x will work with Atlas. Until Projectteam tells us that VirtualBox 5.2.x will work I would still use latest 5.1.x Supporting BOINC, a great concept ! |
Send message Joined: 14 Jan 10 Posts: 1268 Credit: 8,421,616 RAC: 2,139 |
Not shure if VirtualBox 5.2.x will work with Atlas. Maybe you're right Yeti. For me the ATLAS tasks are running well on VBox 5.2.26 and 5.2 28 -> https://lhcathome.cern.ch/lhcathome/results.php?hostid=10360630&offset=0&show_names=0&state=0&appid=14, but I now realize I'm running a newer vboxwrapper (7.9.26000) than the one the project provides and maybe therefore I've no problems ?? |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,966,224 RAC: 136,643 |
@mrbenjamine20 The following log is an example for a host where VM extensions are not activated. You may work through Yeti's checklist to identify/solve the problem. https://lhcathome.cern.ch/lhcathome/result.php?resultid=180597627 This error can be seen if your local BOINC client options regarding RAM and/or swap are set too low. https://lhcathome.cern.ch/lhcathome/result.php?resultid=176496064 This log: https://lhcathome.cern.ch/lhcathome/result.php?resultid=180722534 contains a line that indicates a RAM problem. 2018-03-03 19:03:25 (6228): WARNING: Communication with VM Hypervisor failed. (Possibly Out of Memory). RAM problems often force your machines to swap which makes them slower and slower. At the end your apps run into timeouts and crash. You can avoid that situations by running less apps concurrently. It's not always caused by the lack of RAM. It can also be caused by a slow storage system. |
©2024 CERN