Message boards : ATLAS application : LHC sends 256 thread ATLAS native, not producing load
Message board moderation

To post messages, you must log in.

AuthorMessage
p3d-cluster

Send message
Joined: 29 Oct 07
Posts: 2
Credit: 22,309,221
RAC: 0
Message 43727 - Posted: 27 Nov 2020, 20:15:21 UTC
Last modified: 27 Nov 2020, 20:42:57 UTC

Hello there,

host 10660617 (dual Epyc 7702, 256 threads) is on a profile that allows native WUs, but is limited to 8 threads per WU (Max # of CPUs for this project). All other nodes using this profile work as expected and get 8 thread WUs, regardless whether they crunch native ATLAS or via VirtualBox.

However, as can be seen in the WU linked below, native WUs are assigned with a target of 256 threads on this host:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=289437454

Even if it would work, it wouldn't be efficient, as the entire startup time is wasted. With that many threads it could be even longer than the actual computing time.
I cancelled the above WU after 1.5h since it produced load on just one thread, the rest of the system didn't do anything.

1. why LHC sends these WUs with 256 thread target when other hosts with similar spec receive 8 thread WUs just fine?
2. can the native ATLAS application work with 256 threads at all or does it just fall back to 1 thread?

Any ideas?
ID: 43727 · Report as offensive     Reply Quote
Greger

Send message
Joined: 9 Jan 15
Posts: 151
Credit: 431,596,822
RAC: 0
Message 43728 - Posted: 27 Nov 2020, 22:43:51 UTC - in response to Message 43727.  

For sure yes.

To reduce overhead it would be great to run it on default 12 core for each task but to increasing efficiency you can set it to a few core for each task. I do 4 core for each task as it was great balance for most systems i use.

Looking into systems and os you put in to them they do great with native and ubuntu 18.04 have been more friendly to setup and stable for now but you have a host with 20.04 hat is doing great and looks to be solid.
For 20.04 LTS i have build Singularity as container did not worked to atlas and could be option to do.

The main issue i see and you should do as you use this amount of power is to cut down latency by adding a host with squid to handle amount of data. There is great guide and great config provided by
computezrmle at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5473 suggest to start using and then work with issues on host after that.

There are other great ways as cut down geo IP and increase ram and fast storage on host that do work and suggest to listen on computezrmle and users suggestions and read guides and FAQ for ports.
ID: 43728 · Report as offensive     Reply Quote
p3d-cluster

Send message
Joined: 29 Oct 07
Posts: 2
Credit: 22,309,221
RAC: 0
Message 43733 - Posted: 28 Nov 2020, 15:10:26 UTC

Hi Gunde,

Thanks for your reply!
I found an app_config.xml in the project directory of that host, seems like a leftover from a previous attempt at LHC. Removing it set the host back to normal operation.
computezrmle reached out to me and provided some further insight. I'll be setting up Squid instances this weekend.

I was able to get Ubuntu 20.04 hosts working without compiling Singularity by just using the cvmfs for Ubuntu 20.04 from the cernvm download site.
The Epycs have 2GB memory per thread and are SSD based.
We'll keep them like this. If they would run into hardware limits we can mix in less demanding projects as well.

Have a nice weekend!
ID: 43733 · Report as offensive     Reply Quote

Message boards : ATLAS application : LHC sends 256 thread ATLAS native, not producing load


©2024 CERN