Message boards : ATLAS application : BOINC downloads only 2 ATLAS tasks
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1126
Credit: 21,723,410
RAC: 30,658
Message 35213 - Posted: 9 May 2018, 16:45:36 UTC - in response to Message 35212.  

Jim runs ATLAS (native) on linux.
Erich runs ATLAS (vbox...) on windows.
The latter usually needs more RAM as the VM has to set up an internal CVMFS cache.
oh, okay, I now understand.

Jim, I guess the app_config.xml which you were suggesting hence is for native linux ...
ID: 35213 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,749,249
RAC: 18,068
Message 35214 - Posted: 9 May 2018, 18:54:15 UTC - in response to Message 35213.  

Jim, I guess the app_config.xml which you were suggesting hence is for native linux ...


Replace this:
<app_version>
<app_name>ATLAS</app_name>
<plan_class>native_mt</plan_class>
<avg_ncpus>2</avg_ncpus>
</app_version>

With this:
<app_version>
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>2</avg_ncpus>
</app_version>

And add whatever else is appropriate. It has worked for me in the past.
ID: 35214 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,396,503
RAC: 419,910
Message 35218 - Posted: 10 May 2018, 9:18:36 UTC - in response to Message 35209.  

There was a couple of mails but no follow it seems, I will try again.
ID: 35218 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 282
Credit: 8,896,968
RAC: 7,737
Message 35258 - Posted: 14 May 2018, 11:03:52 UTC

Hi,

In the server config we have:

<max_jobs_in_progress>
  <app>
    <app_name>ATLAS</app_name>
    <cpu_limit>
      <jobs>1</jobs>
      <per_proc/>
    </cpu_limit>


which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download.

The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. So reporting that this host is running 4 cores for ATLAS@Home is reasonably accurate. If the user configured BOINC to cache 20 jobs we would report that the host is running 20 cores for ATLAS@Home which would not be accurate.

The workaround is to use app_config.xml to control the number of cores for each task, as has already been explained in this thread. This is not an ideal solution but it's a consequence of trying to make ATLAS@Home look as much as possible like a normal ATLAS Grid site.
ID: 35258 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,749,249
RAC: 18,068
Message 35259 - Posted: 14 May 2018, 13:05:44 UTC - in response to Message 35258.  
Last modified: 14 May 2018, 13:26:16 UTC

which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download.

The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. So reporting that this host is running 4 cores for ATLAS@Home is reasonably accurate. If the user configured BOINC to cache 20 jobs we would report that the host is running 20 cores for ATLAS@Home which would not be accurate.

Could you not use the "Max # jobs" to limit the number of downloads to the number of cores that you have on your machine? As I recall, that is how it used to be. Or perhaps a separate setting for ATLAS could be used to limit the number of cores for ATLAS only, while the other projects could receive more jobs, if that is the concern.

Then, you could use the "Max # CPUs" setting to determine how many cores are used per work unit. That would spare the user the need for the app_config.

Finally, even if the count is not exact at any given moment, it would only be a few days before the work units completed anyway. It would just add a little latency at the beginning, but the total running would still be accurate.
For a physics experiment that will run for decades, it would appear that the accountants have gained the upper hand prematurely.
ID: 35259 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 282
Credit: 8,896,968
RAC: 7,737
Message 35267 - Posted: 15 May 2018, 14:19:26 UTC - in response to Message 35259.  

which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download.

The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. So reporting that this host is running 4 cores for ATLAS@Home is reasonably accurate. If the user configured BOINC to cache 20 jobs we would report that the host is running 20 cores for ATLAS@Home which would not be accurate.

Could you not use the "Max # jobs" to limit the number of downloads to the number of cores that you have on your machine? As I recall, that is how it used to be. Or perhaps a separate setting for ATLAS could be used to limit the number of cores for ATLAS only, while the other projects could receive more jobs, if that is the concern.

Then, you could use the "Max # CPUs" setting to determine how many cores are used per work unit. That would spare the user the need for the app_config.

Finally, even if the count is not exact at any given moment, it would only be a few days before the work units completed anyway. It would just add a little latency at the beginning, but the total running would still be accurate.
For a physics experiment that will run for decades, it would appear that the accountants have gained the upper hand prematurely.


Well, it's a very expensive experiment so we need a lot of accountants :)

I agree that what you describe is the way that it should work, but I'm not enough of a BOINC expert to understand how the various combinations of options work. But I will keep looking...
ID: 35267 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 325
Credit: 10,749,249
RAC: 18,068
Message 35270 - Posted: 15 May 2018, 14:54:21 UTC - in response to Message 35267.  

Thank you, that is all we can ask for. (My daughter and her husband are accountants. They do great work. But they shouldn't run physics experiments.)
ID: 35270 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,396,503
RAC: 419,910
Message 35283 - Posted: 16 May 2018, 15:26:53 UTC - in response to Message 35258.  

Hi David,

Is the max number of CPUs used to calculate the ram estimate?

Thanks

Toby
ID: 35283 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 741
Credit: 6,027,804
RAC: 838
Message 35300 - Posted: 18 May 2018, 18:03:44 UTC - in response to Message 35283.  

Is the max number of CPUs used to calculate the ram estimate?

Yes, that's what BOINC will reserve for RAM-usage and when the app_config.xml is not used the RAM-amount for the VM to be created.

#1 reserves 3500MB
#2 reserves 4400MB
#3 reserves 5300MB
#4 reserves 6200MB
#5 reserves 7100MB
#6 reserves 8000MB
#7 reserves 8900MB
#8 reserves 9800MB
ID: 35300 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 561
Credit: 349,396,503
RAC: 419,910
Message 35326 - Posted: 21 May 2018, 3:52:42 UTC - in response to Message 35300.  

Thanks CP, still no good options to run many ATLAS tasks.
ID: 35326 · Report as offensive     Reply Quote
Simplex0

Send message
Joined: 26 Aug 05
Posts: 68
Credit: 545,660
RAC: 0
Message 35499 - Posted: 13 Jun 2018, 4:49:45 UTC

BOINC only run 2 ATLAS (8-core) work units on my 16 core Threadripper and only use 50% of the CPU and RAM.
I will stop downloading ATLAS now to se if things go back to normal with only SixTrack running.
ID: 35499 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39020 - Posted: 2 Jun 2019, 17:12:05 UTC
Last modified: 2 Jun 2019, 17:22:17 UTC

I've been mystified as to why I was running and downloading only one native ATLAS WU per Linux computer. I've read this whole thread a few times and tried what I learned here, but it does not work.
If I set Max#CPUs to 8 (Max#Jobs no limit) and use app_config to limit execution to 2 threads then BOINC still counts each WU as 8 CPUs and not 2 CPUs. Watching System Monitor my CPU utilization is really low. It does download up to 8 WUs but they cannot run since BOINC miss counts them.
If I set Max#CPUs to 2 and use the same app_config (which I doubt I need) BOINC appears to count the WU as 2 CPUs. The estimated remaining time is shorter than half of what I usually saw with a single CPU so I assume that's the "efficiency" mentioned below. However, BOINC only downloads 2 ATLAS WUs per computer.
I'm capable of running many more ATLAS WUs but CERN has tied my hands.

May I suggest the CERN accountants go to Lac LĂ©man and enjoy a long walk on a short pier.

As a consequence of setting Max#CPUs to 2 Native Theory only downloads 2C WUs. If memory serves the Yeti Checklist said that ATLAS was the only project that actually used multiple CPUs on the same WU but others packaged multiple WUs in the same job. So when the first finishes a CPU is idle until the second finishes, i.e. inefficient. So with this bug it seems one would run either ATLAS or Native Theory but not both.
ID: 39020 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1137
Credit: 55,879,680
RAC: 96,476
Message 39021 - Posted: 2 Jun 2019, 18:01:15 UTC - in response to Message 39020.  

... and use app_config to limit ...

Do you use the correct filename "app_config.xml" (instead of "app_config") and did you reload your configuration?
Did you place it in the right folder?

See:
https://boinc.berkeley.edu/wiki/client_configuration


You can run ATLAS beside Theory but they need their own section in app_config.xml.
To check this it would be good if you could post that file here.
ID: 39021 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39027 - Posted: 3 Jun 2019, 0:00:09 UTC

File name: /var/lib/boinc-client/projects/lhcathome.cern.ch_lhcathome/app_config.xml
<app_config>
<app>
    <name>sixtrack</name>
</app>
<app>
    <name>ATLAS</name>
    <max_concurrent>6</max_concurrent>
    <app_version>
       <app_name>ATLAS</app_name>
       <plan_class>native_mt</plan_class>
       <avg_ncpus>2.0</avg_ncpus>
       <cmdline>--nthreads 2 --memory_size_mb 3600</cmdline>
    </app_version>
</app>
<app>
    <name>cms</name>
    <max_concurrent>1</max_concurrent>
</app>
<app>
    <name>TheoryN</name>
    <app_version>
       <app_name>TheoryN</app_name>
       <plan_class>native_theory</plan_class>
       <avg_ncpus>2.0</avg_ncpus>
       <cmdline>--nthreads 2</cmdline>
    </app_version>
</app>
<app>
    <name>Theory</name>
    <max_concurrent>1</max_concurrent>
</app>
<app>
    <name>LHCb</name>
    <max_concurrent>1</max_concurrent>
</app>
<app>
    <name>ALICE</name>
    <max_concurrent>1</max_concurrent>
</app>
<report_results_immediately/>
</app_config>

Setting MAX#CPUs = 2 means this app_config is not needed.
ID: 39027 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39028 - Posted: 3 Jun 2019, 0:03:40 UTC
Last modified: 3 Jun 2019, 0:06:02 UTC

It seems it would be easy for CERN to fix this problem by allowing clients to specify MAX#CPUs and MAX#JOBS separately by project like they do at WCG. And changing that reverse logic of using MAX#CPUs to actually specify MAX#CPUs and MAX#JOBS.
Presently MAX#JOBS does who knows what.
ID: 39028 · Report as offensive     Reply Quote
computezrmle
Avatar

Send message
Joined: 15 Jun 08
Posts: 1137
Credit: 55,879,680
RAC: 96,476
Message 39029 - Posted: 3 Jun 2019, 7:21:38 UTC

As Toby Broom already mentioned in another thread your computers show an imbalance between RAM and #cores.
None of them has more than 32 GB.

On the other side ATLAS doesn't use the preference setting as it once has been introduced.
This has been discussed in a long thread for more than a year and must not be repeated here.

Both facts combined lead to a situation that your BOINC client will not run (or even download) the #tasks that you expect.
Especially ATLAS will either limit #tasks according to the value of #cores or (if max #cores = unlimited) it will limit #tasks based on the limited RAM.


A combination of the following measures may help a bit but will not solve your situation completely:
1. Set max #tasks to unlimited
2. Set max #cores to unlimited
3. Use the right app_config.xml (see below)
4. Increase the task buffer of your BOINC client

The next measures would be much more efficient:
5. Buy additional RAM
6. Buy much more RAM
7. Run additional BOINC clients on each computer




Regarding your app_config.xml

There are sections that are obsolete, e.g. sixtrack or ALICE.
Other sections are malformed like
<app>
    . . .
    <app_version>
    . . .
    </app_version>
</app>


"cms" references a non existing app.
It must be "CMS".
BTW: Are you aware that CMS also need lots of RAM (>2.5 GB per task)?


A working minimum app_config.xml for ATLAS native and Theory native should look like this (remove the comments):
<app_config>
    <app_version>
       <app_name>ATLAS</app_name>
       <plan_class>native_mt</plan_class>
       <avg_ncpus>2.0</avg_ncpus>    # set the #cores here according to your needs; <cmdline> is obsolete
    </app_version>
    <app_version>
       <app_name>TheoryN</app_name>
       <plan_class>native_theory</plan_class>
       <avg_ncpus>1.0</avg_ncpus>    # must be set to "1" as there is no multicore app
    </app_version>
    <report_results_immediately/>
</app_config>
ID: 39029 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39040 - Posted: 4 Jun 2019, 15:06:54 UTC - in response to Message 39029.  

What is the RAM requirement for ATLAS??? I've never seen 32 GB not be enough. (BTW, I do not run Oracle's CatBox.)

Yes, I'm well aware that ATLAS uses MAX#CPUs in a way that makes it the most user-unfriendly BOINC project I've ever run.

Setting MAX#CPUs to No Limit means I now get 12C WUs. Watching the System Monitor for a 22c/44t E5-2699 running three 12C ATLAS WUs I see the CPU utilization is very very low. Seems like a huge waste of resources. Only 11 of 32 GB RAM are being used.

The best app_config for ATLAS is none at all.

ATLAS is the most interesting BOINC project at LHC. I wish it was possible to put all of my CPUs to work on it.
ID: 39040 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39041 - Posted: 4 Jun 2019, 15:10:08 UTC - in response to Message 35300.  
Last modified: 4 Jun 2019, 15:10:27 UTC

Yes, that's what BOINC will reserve for RAM-usage and when the app_config.xml is not used the RAM-amount for the VM to be created.
#1 reserves 3500MB
#2 reserves 4400MB
#3 reserves 5300MB
#4 reserves 6200MB
#5 reserves 7100MB
#6 reserves 8000MB
#7 reserves 8900MB
#8 reserves 9800MB
Is there a way that I can see what BOINC is reserving for RAM for native ATLAS??? These values are much higher than what I see on System Monitor.
ID: 39041 · Report as offensive     Reply Quote
Aurum

Send message
Joined: 12 Jun 18
Posts: 50
Credit: 15,103,671
RAC: 101,311
Message 39043 - Posted: 4 Jun 2019, 16:58:34 UTC - in response to Message 39040.  

Watching the System Monitor for a 22c/44t E5-2699 running three 12C ATLAS WUs I see the CPU utilization is very very low. Seems like a huge waste of resources. Only 11 of 32 GB RAM are being used.
Hmm, slowly the CPU utilization rises. Rig-06 is now running 100% CPU with a 12C & three 4Cs filled out with sixtracks. Turned off Native Theory.
RAM: total: 31.31 GiB used: 19.91 GiB (63.6%)
Array-1: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s
Array-2: capacity: 256 GiB slots: 4 EC: None
Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10585461
ID: 39043 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 69
Credit: 8,629,598
RAC: 21,622
Message 39052 - Posted: 5 Jun 2019, 16:30:37 UTC - in response to Message 35259.  

Could you not use the "Max # jobs" to limit the number of downloads to the number of cores that you have on your machine?

I don't understand what you mean: "Max # jobs" is set by the volunteer, but the "you" in your suggestion refers to the project... some of us may be happy with having more jobs queued locally ("Max # jobs" unlimited); it's the project that wants to limit it to the number of cores for (bogus, IMHO) accounting reasons.

I'd have thought the answer is that we need a "Max #cores/task" added to BOINC and solve it once, properly, and in the right place.
ID: 39052 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : ATLAS application : BOINC downloads only 2 ATLAS tasks


©2019 CERN