Message boards : ATLAS application : ATLAS 8CPU Simulation Run Time
Message board moderation

To post messages, you must log in.

AuthorMessage
mrbenjamine20

Send message
Joined: 22 Jan 18
Posts: 2
Credit: 9,132,013
RAC: 0
Message 34530 - Posted: 3 Mar 2018, 22:51:53 UTC

Hello, I am running a couple of servers with dual E5-2680v2 processors, which have a total of 40 logical cores with hyperthreading and its taking the 8CPU ATLAS simulations about 2 days to complete. They seem to complete successfully from what i can tell, but it looks like in the first 20 hours or so it will complete 90% of the job and then it takes another day to complete the remaining 10% or so. The CPU usage always seems to be pretty low (under 20% of total CPU with 5 Atlas 8CPU jobs running) and i don't see any other bottleneck on the server.

I guess my questions are- 1) is this normal and just how the project runs? 2) is there something i can look at to determine a bottleneck? 3) is there any type of tweaking or over-provisioning so i can allow other project to utilize the CPU since its low?

I really like contributing to this project and appreciate any input or suggestions!
ID: 34530 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,966,224
RAC: 136,643
Message 34532 - Posted: 4 Mar 2018, 7:31:26 UTC - in response to Message 34530.  

You may allow other users to read your computer's error logs.
Activate this in Project -> Account -> Preferences for this project LHC@home preferences -> Should LHC@home show your computers on its web site?
ID: 34532 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 34533 - Posted: 4 Mar 2018, 8:23:26 UTC - in response to Message 34530.  

I guess my questions are- 1) is this normal and just how the project runs? 2) is there something i can look at to determine a bottleneck? 3) is there any type of tweaking or over-provisioning so i can allow other project to utilize the CPU since its low?

I would allow to run on that server 8 ATLAS VM's running 'only' with 4 CPU's each.
You may change that in your project preferences.
Set Max # of jobs for this project to 8 and
Max # of CPUs for this project to 4.
You still would have 8 cores left for other projects.
ID: 34533 · Report as offensive     Reply Quote
mrbenjamine20

Send message
Joined: 22 Jan 18
Posts: 2
Credit: 9,132,013
RAC: 0
Message 34537 - Posted: 4 Mar 2018, 15:17:19 UTC - in response to Message 34532.  
Last modified: 4 Mar 2018, 15:20:37 UTC

Thank you! I have updated my preferences to show the computers, any input would be appreciated!

Also, I have modified my preferences to 4 CPUs and the servers have already queued up a couple 4CPU ATLAS job. We will see how the 4CPU job runs over the next few days!

Thanks again for the suggestions!
ID: 34537 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,409,041
RAC: 102,439
Message 34538 - Posted: 4 Mar 2018, 15:24:44 UTC - in response to Message 34537.  

... and the servers have already queued up a couple 4CPU ATLAS job.
you were lucky :-) I got just 2 tasks, and from then on it says "no tasks available for ATLAS simulation" :-(
ID: 34538 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 34543 - Posted: 5 Mar 2018, 9:31:45 UTC - in response to Message 34537.  

Thank you! I have updated my preferences to show the computers, any input would be appreciated!

Also, I have modified my preferences to 4 CPUs and the servers have already queued up a couple 4CPU ATLAS job. We will see how the 4CPU job runs over the next few days!

Thanks again for the suggestions!

Thanks for un-hiding your machines.
Now I can see that you have an old version (5.1.26) of VirtualBox on your 40-core servers installed.
Less problems to expect with the newest version of VirtualBox -> https://www.virtualbox.org/wiki/Downloads
When you also install the VirtualBox 5.2.8 Oracle VM VirtualBox Extension Pack, you are able to watch the VM Console via BOINC Manager and see how the events are processed.
ID: 34543 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 34544 - Posted: 5 Mar 2018, 9:41:02 UTC - in response to Message 34543.  

Now I can see that you have an old version (5.1.26) of VirtualBox on your 40-core servers installed.
Less problems to expect with the newest version of VirtualBox -> https://www.virtualbox.org/wiki/Downloads
When you also install the VirtualBox 5.2.8 Oracle VM VirtualBox Extension Pack, you are able to watch the VM Console via BOINC Manager and see how the events are processed.

Not shure if VirtualBox 5.2.x will work with Atlas.

Until Projectteam tells us that VirtualBox 5.2.x will work I would still use latest 5.1.x


Supporting BOINC, a great concept !
ID: 34544 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 34545 - Posted: 5 Mar 2018, 9:50:05 UTC - in response to Message 34544.  

Not shure if VirtualBox 5.2.x will work with Atlas.

Until Projectteam tells us that VirtualBox 5.2.x will work I would still use latest 5.1.x

Maybe you're right Yeti.
For me the ATLAS tasks are running well on VBox 5.2.26 and 5.2 28 -> https://lhcathome.cern.ch/lhcathome/results.php?hostid=10360630&offset=0&show_names=0&state=0&appid=14,
but I now realize I'm running a newer vboxwrapper (7.9.26000) than the one the project provides and maybe therefore I've no problems ??
ID: 34545 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,966,224
RAC: 136,643
Message 34547 - Posted: 5 Mar 2018, 12:02:51 UTC

@mrbenjamine20

The following log is an example for a host where VM extensions are not activated.
You may work through Yeti's checklist to identify/solve the problem.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=180597627


This error can be seen if your local BOINC client options regarding RAM and/or swap are set too low.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=176496064


This log:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=180722534
contains a line that indicates a RAM problem.
2018-03-03 19:03:25 (6228): WARNING: Communication with VM Hypervisor failed. (Possibly Out of Memory).


RAM problems often force your machines to swap which makes them slower and slower.
At the end your apps run into timeouts and crash.


You can avoid that situations by running less apps concurrently.
It's not always caused by the lack of RAM. It can also be caused by a slow storage system.
ID: 34547 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS 8CPU Simulation Run Time


©2024 CERN