Message boards : ATLAS application : This is not SLC6, need to run with Singularity
Message board moderation

To post messages, you must log in.

AuthorMessage
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35902 - Posted: 14 Jul 2018, 23:33:16 UTC

I'm running ATLAS native tasks on 2 Ubuntu machines and I noticed "This is not SLC6, need to run with Singularity...." in the stderr output files. So I assume if I were using SLC6 (Scientific Linux 6) singularity would not be needed. Would SLC6 increase efficiency and get more ATLAS work done? I don't really need to use Ubuntu, SLC6 would likely suit my needs.
ID: 35902 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35909 - Posted: 15 Jul 2018, 10:04:11 UTC - in response to Message 35902.  
Last modified: 15 Jul 2018, 10:45:04 UTC

... "This is not SLC6, need to run with Singularity...." in the stderr output files. So I assume if I were using SLC6 (Scientific Linux 6) singularity would not be needed.
SLC6 stands for Scientific Linux Cern 6. You can download it here: https://linux.web.cern.ch/linux/scientific6/ (End of support is before December 2020, so you might consider using CC7 (Cern CentOS 7), but i dont know if you can run the native ATLAS app without Singularity on CC7)
Yes, if you use SLC6 or CentOS 6 you can run the ATLAS native app without Singularity, see this task for example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=200098871
The output says:
OS:Scientific Linux release 6.9 (Carbon)

This is SLC or CentOS release 6, run the atlas job without Singularity


Would SLC6 increase efficiency and get more ATLAS work done? I don't really need to use Ubuntu, SLC6 would likely suit my needs.
I dont think that there would be a big difference in efficiency (if you compare singularity runs with non-singularity runs), but yes, theoretically, i think it should be more efficient when you dont use singularity (but i dont have actual numbers on that).

By using the following setup the effect on increasing efficiency is propably much higher compared to running without singularity and much less work to set up (although this is not the exact answer to your question :-) ):
One of your computers is configured to use 4 cores for ATLAS native. Generally speaking, the lower the CPU cores per ATLAS task, the higher the efficiency of that task (if you define it as (CPU time)/(task runtime * number of cores), i.e. it is more efficient to run concurrently 4 1-core tasks than 1 4-core task. So to increase efficiency, the first thing i would do is to reduce the number of cores per ATLAS task from 4 to 1 IF you have enough RAM for that (also running 2 2-core tasks would be more efficient than your current setup and propably the better choice considering your 8GB RAM limitation).
ID: 35909 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35924 - Posted: 15 Jul 2018, 22:02:42 UTC - in response to Message 35909.  

So to increase efficiency, the first thing i would do is to reduce the number of cores per ATLAS task from 4 to 1 IF you have enough RAM for that (also running 2 2-core tasks would be more efficient than your current setup and propably the better choice considering your 8GB RAM limitation).


I tried that on both hosts that I have setup to run ATLAS native but then the server refused to send new tasks claiming VBox is not installed. It refused both hosts.

VBox is installed on the 4-core but but I have the "do not use VBox" option turned on (set=1) in cc_config.xml . I purposely did not install it on the 2-core host because elsewhere somebody wrote the only way to guarantee that VBox tasks will not be sent is to not install VBox.

I reversed changes to the website prefs and now I get tasks again. Very strange that. I recall other posts about that problem. Perhaps the fix is already given in that thread.
ID: 35924 · Report as offensive     Reply Quote
AuxRx

Send message
Joined: 16 Sep 17
Posts: 100
Credit: 1,618,469
RAC: 0
Message 35936 - Posted: 16 Jul 2018, 9:54:30 UTC - in response to Message 35909.  

One of your computers is configured to use 4 cores for ATLAS native. Generally speaking, the lower the CPU cores per ATLAS task, the higher the efficiency of that task (if you define it as (CPU time)/(task runtime * number of cores), i.e. it is more efficient to run concurrently 4 1-core tasks than 1 4-core task. So to increase efficiency, the first thing i would do is to reduce the number of cores per ATLAS task from 4 to 1 IF you have enough RAM for that (also running 2 2-core tasks would be more efficient than your current setup and propably the better choice considering your 8GB RAM limitation).


That's news to me. How could the efficiency of the compute improve with fewer cores? Are you refering to eliminating idle time connected with starting and stopping tasks? What is your metric?

I am familiar with the diminishing returns of Hyper Threading (or SMT in general) and issues with the credit system, but those are different arguments.
ID: 35936 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35938 - Posted: 16 Jul 2018, 12:57:27 UTC - in response to Message 35936.  

@bronco:
Have you changed the number of cores in the LHC@Home settings on the homepage? I think they introduced a limit of maximal concurrent tasks per host which depends on the "Max # CPUs" setting in order to get a more accurate number of actual used cores for LHC@home. I.e. if you have set "Max # CPUs = 1", the server will only allow you to have one task in progress. So if you already had a task (or more) with the "in progress" status, you wont be able to download any more.

@AuxRx:
As said, if you define the efficiency as efficiency = (CPU time)/(task runtime * number of cores), the lower the number of cores per ATLAS task, the higher the efficiency for that task will be. Here an example (both tasks were crunched on the exact same machine and are from the same task ID):

- 1-core task: CPU time = 39,091.66s, Run Time = 39,206.37s, number of cores = 1
==> efficiency = 99.7%

- 4-core task: CPU time = 47,125.95s, Run Time = 12,424.23s, number of cores = 4
==> efficiency = 94.8%

The reason behind it is that, as you said, the actual computation starts faster for the 1-core task compared to the 4-core task, and in the design of the multicore app: As far as i know, the current tasks calculate 200 events. If you use a 4-core task, these 200 events are split up to 50 events per core. Since every event takes a different time to calculate, one core will have finished its 50 events faster than another core. So all cores have to waite untill the last one has finished its 50 events. This "waiting time" is bad for efficiency.
I think computezrmle described it in more detail in another post, but i cant find i right now.
ID: 35938 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,332,141
RAC: 197,196
Message 35939 - Posted: 16 Jul 2018, 13:10:13 UTC

ID: 35939 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35941 - Posted: 16 Jul 2018, 15:06:45 UTC - in response to Message 35938.  

@bronco:
Have you changed the number of cores in the LHC@Home settings on the homepage?

Yes, I did. I changed it from 4 to 2 when I still had a few 4-core tasks in my cache. Then I forced an update and noticed that in addition to getting a new 2-core task the 4-core tasks in my cache changed status in BOINC manager from 4-core to 2-core and instead of just 1 running task I suddenly had 2 running tasks. Those completed and then it downloaded a couple of VBox tasks and later a couple of 2-core tasks. I aborted the VBox tasks, a few more 2-cores ran to completion and then the next time it requested tasks the complaints about VBox not being installed occurred.

I think they introduced a limit of maximal concurrent tasks per host which depends on the "Max # CPUs" setting in order to get a more accurate number of actual used cores for LHC@home. I.e. if you have set "Max # CPUs = 1", the server will only allow you to have one task in progress. So if you already had a task (or more) with the "in progress" status, you wont be able to download any more.

That makes sense. I suspect it had nothing at all to do with VBox being installed or not installed. I suspect the VBox message is a one-size-fits-all response the server spits out whenever it doesn't want to send tasks regardless of it's real reason for not wanting to send tasks. Some day the devs will refine it and make it spit out a more appropriate and informative response. Until then users will learn to live with the confusion and learn to like it :)

I fixed it by letting my cache drain overnight. Then this morning I set "Max # CPUs=2" and "Max # tasks=4" and now I'm getting 2-core tasks. At the moment it's crunching 2 X 2-core ATLAS natives (4 athena.py showing in top, each using 23.2% of physical RAM). In addition to that I have Firefox running and 3 BOINC manager instances and VLC is playing a DVD. Not bad for just 8 GB RAM and a relatively slow CPU. I'll try this configuration for a couple weeks and watch for failed tasks as well as HITS files in the stderr output. If this 2 X 2-core configuration works well for 2 weeks I might try 4 X 1-core tasks but I doubt that will work reliably along with all the other stuff I like to do.
ID: 35941 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35942 - Posted: 16 Jul 2018, 15:38:02 UTC - in response to Message 35938.  

Since every event takes a different time to calculate, one core will have finished its 50 events faster than another core. So all cores have to waite untill the last one has finished its 50 events. This "waiting time" is bad for efficiency.
I think computezrmle described it in more detail in another post, but i cant find i right now.

I recall that post and it was indeed quite detailed. Though it was informative and certainly added to my understanding of the situation, the above quote ties it all together for me. I've had similar concerns for the multi-core Theory tasks where it appears that if the algorithm for determining if there is enough time remaining for another job says "not enough time" and there is a looping Sherpa job or maybe a Pythia that is taking an abnormally long time to complete, then you have cores sitting idle until the task bumps up against the 18 hour limit.

Meh, I'm not convinced these multi-core apps are a good idea.
ID: 35942 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1556
Credit: 57,332,141
RAC: 197,196
Message 35944 - Posted: 17 Jul 2018, 5:35:43 UTC - in response to Message 35939.  
Last modified: 17 Jul 2018, 5:38:13 UTC

In Atlas-Task protocol stats of CPU-use is shown:
For example a Linux SL69 with 4 CPU's:
runtimeenvironments=APPS/HEP/ATLAS-SITE;
Processors=1
WallTime=15544.04s
KernelTime=4830.47s
UserTime=53774.48s
CPUUsage=377%
377% is a good performance!
Edit: with openhtc.io
ID: 35944 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 703
Credit: 4,244,064
RAC: 325
Message 35945 - Posted: 17 Jul 2018, 8:18:06 UTC
Last modified: 17 Jul 2018, 8:19:28 UTC

On my main Linux box I am running a one core Atlas task. The Opteron 1210 CPU has only two cores. All two core Atlas tasks have failed, so I restricted Atlas to run on one core. But GPUGRID CPU tasks run on two cores on the same CPU. RAM is 8 GB, OS is SuSE Linux Leap 42.3.
Tullio
ID: 35945 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 35946 - Posted: 17 Jul 2018, 10:39:19 UTC - in response to Message 35942.  
Last modified: 17 Jul 2018, 10:43:19 UTC

Since every event takes a different time to calculate, one core will have finished its 50 events faster than another core. So all cores have to waite untill the last one has finished its 50 events. This "waiting time" is bad for efficiency.
I think computezrmle described it in more detail in another post, but i cant find i right now.
Maybe I have to correct myself a little bit (im sure a more advanced user can give better infos): I think that the 200 events are not actually split up into 50 events per core, i.e. if one core (respectively the calculated events on that core) is faster than others, it might calculate more than 50 events and therefore another core calculates less than 50. But in principle, and the fact that all cores have to wait until the last core has finished its last event should be still true.

I recall that post and it was indeed quite detailed. Though it was informative and certainly added to my understanding of the situation, the above quote ties it all together for me. I've had similar concerns for the multi-core Theory tasks where it appears that if the algorithm for determining if there is enough time remaining for another job says "not enough time" and there is a looping Sherpa job or maybe a Pythia that is taking an abnormally long time to complete, then you have cores sitting idle until the task bumps up against the 18 hour limit.

Meh, I'm not convinced these multi-core apps are a good idea.
Yes, if you have enough RAM and want the best efficiency, you should run only 1-core tasks (true for all LHC apps as far as i know). But for PCs with low RAM, multicore apps can help to get more CPU cores that crunch for LHC@Home (although the efficiency is not so good but still the absolute work done is still higher).
ID: 35946 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35947 - Posted: 17 Jul 2018, 16:25:38 UTC - in response to Message 35945.  

The Opteron 1210 CPU has only two cores. All two core Atlas tasks have failed, so I restricted Atlas to run on one core. But GPUGRID CPU tasks run on two cores on the same CPU. RAM is 8 GB, OS is SuSE Linux Leap 42.3.
Tullio


Hmmm. My Inspiron machine with 2 cores, 8GB RAM and Linux used to run 2-core ATLAS tasks in VBox no problem. Yours should too. And with 8GB RAM I think you should be able to run two 1-core tasks. I think probably you have something configured wrong.
ID: 35947 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35948 - Posted: 17 Jul 2018, 16:35:35 UTC - in response to Message 35946.  

Meh, I'm not convinced these multi-core apps are a good idea.
Yes, if you have enough RAM and want the best efficiency, you should run only 1-core tasks (true for all LHC apps as far as i know). But for PCs with low RAM, multicore apps can help to get more CPU cores that crunch for LHC@Home (although the efficiency is not so good but still the absolute work done is still higher).

It's strictly up to the individual but maybe cores that cannot be used at close to 100% efficiency should be used for other projects where they run at close to 100% efficiency. Again, it's a personal decision, there is no right or wrong choice on that matter.
ID: 35948 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 590
Credit: 21,907,819
RAC: 3,379
Message 35949 - Posted: 17 Jul 2018, 16:49:11 UTC - in response to Message 35948.  
Last modified: 17 Jul 2018, 16:49:24 UTC

There have been discussions on efficiency before, and there is some trade-off between computational efficiency and other factors, such as memory use and transfer efficiency. I like to run two work units at a time on native ATLAS (i7-4770).

Along with the caching technique described by computezrmle https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4758, I am now getting a CPU efficiency of 198% for two cores (or 99% per core), as shown in BoincTasks.
ID: 35949 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 366
Credit: 13,252,062
RAC: 6,899
Message 36160 - Posted: 1 Aug 2018, 10:52:55 UTC - in response to Message 35949.  

Hi,

For some more details on performance measurements you may like to check out Wenjing's talk at the CHEP 2018 conference a few weeks ago (links to slides are at the bottom).

In summary the difference between running in singularity and true native mode is negligible, especially compared to the difference between native and virtualbox modes.
ID: 36160 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,694,912
RAC: 93,663
Message 36164 - Posted: 1 Aug 2018, 12:07:59 UTC - in response to Message 36160.  

Hi David,

On slide 4 of the presentation the authors mention BOINC "Ranked as the 6th site in terms of good CPU time".
Does this "good CPU time" include CPU time that is spent for those lots of error-65-WUs we had during the last weeks?
ID: 36164 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 366
Credit: 13,252,062
RAC: 6,899
Message 36165 - Posted: 1 Aug 2018, 12:33:36 UTC - in response to Message 36164.  

Hi David,

On slide 4 of the presentation the authors mention BOINC "Ranked as the 6th site in terms of good CPU time".
Does this "good CPU time" include CPU time that is spent for those lots of error-65-WUs we had during the last weeks?


No, "good" here means good for physics, i.e. a valid HITS file was produced.
ID: 36165 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1992
Credit: 143,694,912
RAC: 93,663
Message 36167 - Posted: 1 Aug 2018, 13:09:56 UTC - in response to Message 36165.  

Thank you.
+1
ID: 36167 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1103
Credit: 6,874,445
RAC: 685
Message 36181 - Posted: 1 Aug 2018, 19:58:57 UTC - in response to Message 36160.  

Hi,

For some more details on performance measurements you may like to check out Wenjing's talk at the CHEP 2018 conference a few weeks ago (links to slides are at the bottom).

In summary the difference between running in singularity and true native mode is negligible, especially compared to the difference between native and virtualbox modes.

In that presentation the performance of a 4-core VM is mentioned as 72%.

That differs a lot with my experience with 4-core ATLAS VM's on Windows.

My last 7 tasks use in 129,610 elapsed seconds 494,654 real CPU-seconds. That's IMO a performance of 95.41%.
ID: 36181 · Report as offensive     Reply Quote

Message boards : ATLAS application : This is not SLC6, need to run with Singularity


©2022 CERN