Message boards :
ATLAS application :
BOINC downloads only 2 ATLAS tasks
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,165 RAC: 22,029 |
Jim runs ATLAS (native) on linux.oh, okay, I now understand. Jim, I guess the app_config.xml which you were suggesting hence is for native linux ... |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Jim, I guess the app_config.xml which you were suggesting hence is for native linux ... Replace this: <app_version> <app_name>ATLAS</app_name> <plan_class>native_mt</plan_class> <avg_ncpus>2</avg_ncpus> </app_version> With this: <app_version> <app_name>ATLAS</app_name> <plan_class>vbox64_mt_mcore_atlas</plan_class> <avg_ncpus>2</avg_ncpus> </app_version> And add whatever else is appropriate. It has worked for me in the past. |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
There was a couple of mails but no follow it seems, I will try again. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi, In the server config we have: <max_jobs_in_progress> <app> <app_name>ATLAS</app_name> <cpu_limit> <jobs>1</jobs> <per_proc/> </cpu_limit> which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download. The reason for this setting is to try and provide an accurate picture in ATLAS montoring and accounting of how many tasks are actually running at a given moment. For example a 4-core machine can download 4 tasks and has one task running using 4 cores. So reporting that this host is running 4 cores for ATLAS@Home is reasonably accurate. If the user configured BOINC to cache 20 jobs we would report that the host is running 20 cores for ATLAS@Home which would not be accurate. The workaround is to use app_config.xml to control the number of cores for each task, as has already been explained in this thread. This is not an ideal solution but it's a consequence of trying to make ATLAS@Home look as much as possible like a normal ATLAS Grid site. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download. Could you not use the "Max # jobs" to limit the number of downloads to the number of cores that you have on your machine? As I recall, that is how it used to be. Or perhaps a separate setting for ATLAS could be used to limit the number of cores for ATLAS only, while the other projects could receive more jobs, if that is the concern. Then, you could use the "Max # CPUs" setting to determine how many cores are used per work unit. That would spare the user the need for the app_config. Finally, even if the count is not exact at any given moment, it would only be a few days before the work units completed anyway. It would just add a little latency at the beginning, but the total running would still be accurate. For a physics experiment that will run for decades, it would appear that the accountants have gained the upper hand prematurely. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
which means that you can only download as many tasks as CPUs on the host. So setting Max # CPUs actually sets how many tasks you can download. Well, it's a very expensive experiment so we need a lot of accountants :) I agree that what you describe is the way that it should work, but I'm not enough of a BOINC expert to understand how the various combinations of options work. But I will keep looking... |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Thank you, that is all we can ask for. (My daughter and her husband are accountants. They do great work. But they shouldn't run physics experiments.) |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
Hi David, Is the max number of CPUs used to calculate the ram estimate? Thanks Toby |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
Is the max number of CPUs used to calculate the ram estimate? Yes, that's what BOINC will reserve for RAM-usage and when the app_config.xml is not used the RAM-amount for the VM to be created. #1 reserves 3500MB #2 reserves 4400MB #3 reserves 5300MB #4 reserves 6200MB #5 reserves 7100MB #6 reserves 8000MB #7 reserves 8900MB #8 reserves 9800MB |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
Thanks CP, still no good options to run many ATLAS tasks. |
Send message Joined: 26 Aug 05 Posts: 68 Credit: 545,660 RAC: 0 |
BOINC only run 2 ATLAS (8-core) work units on my 16 core Threadripper and only use 50% of the CPU and RAM. I will stop downloading ATLAS now to se if things go back to normal with only SixTrack running. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
I've been mystified as to why I was running and downloading only one native ATLAS WU per Linux computer. I've read this whole thread a few times and tried what I learned here, but it does not work. If I set Max#CPUs to 8 (Max#Jobs no limit) and use app_config to limit execution to 2 threads then BOINC still counts each WU as 8 CPUs and not 2 CPUs. Watching System Monitor my CPU utilization is really low. It does download up to 8 WUs but they cannot run since BOINC miss counts them. If I set Max#CPUs to 2 and use the same app_config (which I doubt I need) BOINC appears to count the WU as 2 CPUs. The estimated remaining time is shorter than half of what I usually saw with a single CPU so I assume that's the "efficiency" mentioned below. However, BOINC only downloads 2 ATLAS WUs per computer. I'm capable of running many more ATLAS WUs but CERN has tied my hands. May I suggest the CERN accountants go to Lac Léman and enjoy a long walk on a short pier. As a consequence of setting Max#CPUs to 2 Native Theory only downloads 2C WUs. If memory serves the Yeti Checklist said that ATLAS was the only project that actually used multiple CPUs on the same WU but others packaged multiple WUs in the same job. So when the first finishes a CPU is idle until the second finishes, i.e. inefficient. So with this bug it seems one would run either ATLAS or Native Theory but not both. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
... and use app_config to limit ... Do you use the correct filename "app_config.xml" (instead of "app_config") and did you reload your configuration? Did you place it in the right folder? See: https://boinc.berkeley.edu/wiki/client_configuration You can run ATLAS beside Theory but they need their own section in app_config.xml. To check this it would be good if you could post that file here. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
File name: /var/lib/boinc-client/projects/lhcathome.cern.ch_lhcathome/app_config.xml <app_config> <app> <name>sixtrack</name> </app> <app> <name>ATLAS</name> <max_concurrent>6</max_concurrent> <app_version> <app_name>ATLAS</app_name> <plan_class>native_mt</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2 --memory_size_mb 3600</cmdline> </app_version> </app> <app> <name>cms</name> <max_concurrent>1</max_concurrent> </app> <app> <name>TheoryN</name> <app_version> <app_name>TheoryN</app_name> <plan_class>native_theory</plan_class> <avg_ncpus>2.0</avg_ncpus> <cmdline>--nthreads 2</cmdline> </app_version> </app> <app> <name>Theory</name> <max_concurrent>1</max_concurrent> </app> <app> <name>LHCb</name> <max_concurrent>1</max_concurrent> </app> <app> <name>ALICE</name> <max_concurrent>1</max_concurrent> </app> <report_results_immediately/> </app_config> Setting MAX#CPUs = 2 means this app_config is not needed. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
It seems it would be easy for CERN to fix this problem by allowing clients to specify MAX#CPUs and MAX#JOBS separately by project like they do at WCG. And changing that reverse logic of using MAX#CPUs to actually specify MAX#CPUs and MAX#JOBS. Presently MAX#JOBS does who knows what. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
As Toby Broom already mentioned in another thread your computers show an imbalance between RAM and #cores. None of them has more than 32 GB. On the other side ATLAS doesn't use the preference setting as it once has been introduced. This has been discussed in a long thread for more than a year and must not be repeated here. Both facts combined lead to a situation that your BOINC client will not run (or even download) the #tasks that you expect. Especially ATLAS will either limit #tasks according to the value of #cores or (if max #cores = unlimited) it will limit #tasks based on the limited RAM. A combination of the following measures may help a bit but will not solve your situation completely: 1. Set max #tasks to unlimited 2. Set max #cores to unlimited 3. Use the right app_config.xml (see below) 4. Increase the task buffer of your BOINC client The next measures would be much more efficient: 5. Buy additional RAM 6. Buy much more RAM 7. Run additional BOINC clients on each computer Regarding your app_config.xml There are sections that are obsolete, e.g. sixtrack or ALICE. Other sections are malformed like <app> . . . <app_version> . . . </app_version> </app> "cms" references a non existing app. It must be "CMS". BTW: Are you aware that CMS also need lots of RAM (>2.5 GB per task)? A working minimum app_config.xml for ATLAS native and Theory native should look like this (remove the comments): <app_config> <app_version> <app_name>ATLAS</app_name> <plan_class>native_mt</plan_class> <avg_ncpus>2.0</avg_ncpus> # set the #cores here according to your needs; <cmdline> is obsolete </app_version> <app_version> <app_name>TheoryN</app_name> <plan_class>native_theory</plan_class> <avg_ncpus>1.0</avg_ncpus> # must be set to "1" as there is no multicore app </app_version> <report_results_immediately/> </app_config> |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
What is the RAM requirement for ATLAS??? I've never seen 32 GB not be enough. (BTW, I do not run Oracle's CatBox.) Yes, I'm well aware that ATLAS uses MAX#CPUs in a way that makes it the most user-unfriendly BOINC project I've ever run. Setting MAX#CPUs to No Limit means I now get 12C WUs. Watching the System Monitor for a 22c/44t E5-2699 running three 12C ATLAS WUs I see the CPU utilization is very very low. Seems like a huge waste of resources. Only 11 of 32 GB RAM are being used. The best app_config for ATLAS is none at all. ATLAS is the most interesting BOINC project at LHC. I wish it was possible to put all of my CPUs to work on it. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
Yes, that's what BOINC will reserve for RAM-usage and when the app_config.xml is not used the RAM-amount for the VM to be created.Is there a way that I can see what BOINC is reserving for RAM for native ATLAS??? These values are much higher than what I see on System Monitor. |
Send message Joined: 12 Jun 18 Posts: 126 Credit: 53,906,164 RAC: 0 |
Watching the System Monitor for a 22c/44t E5-2699 running three 12C ATLAS WUs I see the CPU utilization is very very low. Seems like a huge waste of resources. Only 11 of 32 GB RAM are being used.Hmm, slowly the CPU utilization rises. Rig-06 is now running 100% CPU with a 12C & three 4Cs filled out with sixtracks. Turned off Native Theory. RAM: total: 31.31 GiB used: 19.91 GiB (63.6%) Array-1: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_A1 size: 16 GiB speed: 2400 MT/s Array-2: capacity: 256 GiB slots: 4 EC: None Device-1: DIMM_C1 size: 16 GiB speed: 2400 MT/s https://lhcathome.cern.ch/lhcathome/results.php?hostid=10585461 |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 15,000,737 RAC: 2 |
Could you not use the "Max # jobs" to limit the number of downloads to the number of cores that you have on your machine? I don't understand what you mean: "Max # jobs" is set by the volunteer, but the "you" in your suggestion refers to the project... some of us may be happy with having more jobs queued locally ("Max # jobs" unlimited); it's the project that wants to limit it to the number of cores for (bogus, IMHO) accounting reasons. I'd have thought the answer is that we need a "Max #cores/task" added to BOINC and solve it once, properly, and in the right place. |
©2024 CERN