Message boards :
Theory Application :
Move TheoryN Back Into Theory.
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Linux host: We reduced the number of tasks downloadable as some bad hosts were pulling 50 or so jobs at a time and failing them all. The current limit is one task per ncpu. I don't know why you are not getting two tasks if you have two cores. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
Maybe it's the Max # cpus =1? but if I set that to 2, I would most likely get a 2-core job rather than 2 x 1-core jobs which is what works best on that host. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Maybe it's the Max # cpus =1? I don't think it is that. I am trying to debug it now with my machine (ncpu = 4). On the server we have the following configuration: <max_jobs_in_progress> <app> <app_name>Theory</app_name> <cpu_limit> <jobs>1</jobs> <per_proc>1</per_proc> </cpu_limit> </app> </max_jobs_in_progress> From my machine the scheduler debugging reports: [quota] Limits for Theory: [quota] CPU: base 1 scaled 7 njobs 0 |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,470,934 RAC: 2,905 |
On the server we have the following configuration: I don't know BOINC's server configuration, but as layman I would think that config should maybe look like: <max_jobs_in_progress> <app> <app_name>Theory</app_name> <jobs>2</jobs> <cpu_limit> <per_proc>1</per_proc> </cpu_limit> </app> </max_jobs_in_progress> jobs outside the cpu_limit. And # jobs should come from preference Max # jobs. Now it looks it's coming from preference Max # CPUs |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
The documentation is here. I think it is a case here we have to resort to reading the code to find out exactly what is happening. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,470,934 RAC: 2,905 |
The documentation is here. I think it is a case here we have to resort to reading the code to find out exactly what is happening.here bolded by me. I can't say that the template mentioned in ProjectOptions - Job limits (advanced) is unambiguously. It looks like <per_proc/> is only used as a closing tag (when present) and not with a number in between like yours <per_proc>1</per_proc> I found an other config_aux.xml example: <?xml version="1.0" ?> <config> <max_jobs_in_progress> <app> <app_name>xansons_gpu</app_name> <gpu_limit> <jobs>50</jobs> <per_proc/> </gpu_limit> </app> <app> <app_name>xansons_cpu</app_name> <cpu_limit> <jobs>25</jobs> </cpu_limit> </app> </max_jobs_in_progress> </config> |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,652 |
Have now first -native Theory(300.02). Prefs: Tasks=4 and Cpu=1, use Test-Application=no, Theory native= no Got only one task instead of two before with old Version (1.01). Edit: In the Project-Stats this new -native(300.02) are shown under Theory-Tasks. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
Minty has been getting vbox jobs all day but still only one at a time. I even tried allowing "native if available" and the Native app itself but it still refuses to download any more - "No tasks available for ..." But allowing Sixtrack gets one task to use the idle core and 2 spares so the host is fully occupied once more although perhaps not as expected. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
The parse bool function suggests both will result to true. I tried the self-closing tag first before making it more explicit. This assignment also suggests it is working from the the log output I was getting: [quota] Limits for Theory: [quota] CPU: base 1 scaled 7 njobs 0 The base and scaled values seem correct for my host with 4 ncpus. Need to find out why njobs isn't as expected. It could be something trivial such as total_limit not being defined so defaulting to 1. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Need to find out why njobs isn't as expected. It could be something trivial such as total_limit not being defined so defaulting to 1. I think njobs is the number of tasks being returned so probably not what we are looking for. total_limit is now set to 10. Let's see how far we get and if we can understand what is going on. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,652 |
Found this thread, is it this pref: <max_wus_in_progress> N </max_wus_in_progress> https://boinc.berkeley.edu/forum_thread.php?id=12588 |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
Found this thread, is it this pref: This is in the config.xml and is for the whole project. It is currently set to 50. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,470,934 RAC: 2,905 |
I think njobs is the number of tasks being returned so probably not what we are looking for. total_limit is now set to 10. Let's see how far we get and if we can understand what is going on. The Max # CPUs is limiting the Max # of jobs. Now max CPUs must be equal or higher than Max jobs, else you don't get the number of tasks you want or your buffer can hold. E.g. Max # jobs 3 Max # CPUs 2 I only get 2 tasks. When I set Max # CPUs to 3 or higher, I get 3 tasks. The Max # CPU's should have no influence on the number of tasks. Btw: For Theory I would remove the Max # of CPUs and run only single core Theory-tasks. At the moment higher number of cpus is only useful for the ATLAS-application. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Btw: For Theory I would remove the Max # of CPUs and run only single core Theory-tasks. That explains why BOINCTasks is showing only 50% CPU usage for the native Theory. I was wondering about that. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
The Max # CPUs is limiting the Max # of jobs. I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains: <min_ncpus>1</min_ncpus> <max_threads>2</max_threads> As far as I understand the Theory app does use two threads but is there any advantage of giving two CPUs? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 254,014,157 RAC: 46,168 |
The following pstree command shows a boinc client running 3 Theory native tasks: boinc─┬─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe │ │ │ ├─runRivet.sh───sleep │ │ │ ├─rungen.sh───pythia8.exe │ │ │ └─sleep │ │ └─7*[{runc}] │ └─{wrapper_2019_03} ├─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe │ │ │ ├─runRivet.sh───sleep │ │ │ ├─rungen.sh───pythia8.exe │ │ │ └─sleep │ │ └─10*[{runc}] │ └─{wrapper_2019_03} ├─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh───xargs───complete.sh │ │ └─7*[{runc}] │ └─{wrapper_2019_03} └─{boinc} Going deeper in the tree shows that there are far more than 2 processes per task (threads is not correct in this context): cranky-0.0.29(6085)───runc(7754)─┬─job(7795)───runRivet.sh(7978)─┬─rivetvm.exe(17190) │ ├─runRivet.sh(17191)───sleep(2926) │ ├─rungen.sh(17189)───pythia8.exe(18501) │ └─sleep(7472) ├─{runc}(7776) ├─{runc}(7777) ├─{runc}(7778) ├─{runc}(7779) ├─{runc}(7783) ├─{runc}(7784) └─{runc}(7786) On a standard linux system the kernel scheduler takes care of all processes (not only boinc/cranky) and assigns each a fair amount of CPU resources. Guess this example runs on a >1 core cpu far away from being fully loaded, then the 2 main processes runRivet.sh(17191) and pythia8.exe(18501) will run on different cores with the result that the counted CPU time will rise faster than walltime. This can be controlled via cgroups where it would be possible to e.g. limit the whole process tree to 1 single core, or even a fraction of a core. A vbox app implicitly does the same when a VM is set up as n core VM. In this case the hypervisor tells the kernel to limit the CPU usage of all processes inside the VM to n cores. From the BOINC client's perspective Theory native is designed to be a singlecore app and BOINC doesn't care whether the app launches just a single process or many of them. BOINC parameters like #cores are used to tell the BOINC client how much work can be fetched and how many tasks can run concurrently. They sometimes can be used to tell the BOINC client a bit about the internal app structure, e.g. ATLAS where we have real threads. But there is no direct interface between such an app and the BOINC client. You may remember that sometimes we had misconfigured ATLAS tasks that set up 8 athena threads on a singlecore VM. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
I have removed the multi-threading values from the plan class. It should now always runs as single CPU. |
Send message Joined: 14 Jan 10 Posts: 1419 Credit: 9,470,934 RAC: 2,905 |
I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains: This is how limits are now working when requesting Theory's: Max 1 task / thread Max # of CPUs Max # of jobs. Since the tasks will run single core it's best to set 'No limit' for Max # of CPU's to avoid getting less tasks than you expect. If you want less tasks than the number of threads set that lower value to Max # of tasks. |
Send message Joined: 20 Jun 14 Posts: 380 Credit: 238,712 RAC: 0 |
I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains: This will affect all VBox apps. I will investigate how to disable Max # of CPUs for single threaded apps. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 1,652 |
We reduced the number of tasks downloadable as some bad hosts were pulling 50 or so jobs at a time and failing them all. The current limit is one task per ncpu. I don't know why you are not getting two tasks if you have two cores. This is such a fubor host: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10283981&offset=0&show_names=0&state=0&appid=13 |
©2024 CERN