Message boards : Theory Application : Move TheoryN Back Into Theory.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40524 - Posted: 19 Nov 2019, 11:51:39 UTC - in response to Message 40523.  
Last modified: 19 Nov 2019, 11:51:55 UTC

Linux host:

Preference - Theory Simulation
VBox installed - Yes
Run test applications? - checked
Run native if available? - not checked ------------ I got Theory Sim 300.02 (vbox64_theory) as requested 8¬)

So app selection looks to be working, however, I have set
Max # jobs = 4
Max # cpus =1
I used to get up to 4 tasks, running 1 core each, on this 2-core host (which is the configuration which has always worked well on it) but today I only get 1 task at a time, (event log says No tasks available) leaving the other core idle.


We reduced the number of tasks downloadable as some bad hosts were pulling 50 or so jobs at a time and failing them all. The current limit is one task per ncpu. I don't know why you are not getting two tasks if you have two cores.
ID: 40524 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 40526 - Posted: 19 Nov 2019, 13:07:08 UTC - in response to Message 40524.  

Maybe it's the Max # cpus =1?
but if I set that to 2, I would most likely get a 2-core job rather than 2 x 1-core jobs which is what works best on that host.
ID: 40526 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40528 - Posted: 19 Nov 2019, 13:25:36 UTC - in response to Message 40526.  

Maybe it's the Max # cpus =1?
but if I set that to 2, I would most likely get a 2-core job rather than 2 x 1-core jobs which is what works best on that host.

I don't think it is that. I am trying to debug it now with my machine (ncpu = 4). On the server we have the following configuration:
<max_jobs_in_progress>
  <app>
    <app_name>Theory</app_name>
    <cpu_limit>
      <jobs>1</jobs>
      <per_proc>1</per_proc>
    </cpu_limit>
  </app>
</max_jobs_in_progress>


From my machine the scheduler debugging reports:
[quota] Limits for Theory:
[quota] CPU: base 1 scaled 7 njobs 0
ID: 40528 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40529 - Posted: 19 Nov 2019, 13:34:53 UTC - in response to Message 40528.  
Last modified: 19 Nov 2019, 13:35:56 UTC

On the server we have the following configuration:
<max_jobs_in_progress>
  <app>
    <app_name>Theory</app_name>
    <cpu_limit>
      <jobs>1</jobs>
      <per_proc>1</per_proc>
    </cpu_limit>
  </app>
</max_jobs_in_progress>

I don't know BOINC's server configuration, but as layman I would think that config should maybe look like:

<max_jobs_in_progress>
  <app>
    <app_name>Theory</app_name>
     <jobs>2</jobs>
     <cpu_limit>
      <per_proc>1</per_proc>
     </cpu_limit>
  </app>
</max_jobs_in_progress>


jobs outside the cpu_limit. And # jobs should come from preference Max # jobs. Now it looks it's coming from preference Max # CPUs
ID: 40529 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40530 - Posted: 19 Nov 2019, 13:50:37 UTC - in response to Message 40529.  


I don't know BOINC's server configuration, but as layman I would think that config should maybe look like:

The documentation is here. I think it is a case here we have to resort to reading the code to find out exactly what is happening.
ID: 40530 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40531 - Posted: 19 Nov 2019, 15:45:30 UTC - in response to Message 40530.  
Last modified: 19 Nov 2019, 16:00:57 UTC

The documentation is here. I think it is a case here we have to resort to reading the code to find out exactly what is happening.
here bolded by me.
I can't say that the template mentioned in ProjectOptions - Job limits (advanced) is unambiguously.

It looks like <per_proc/> is only used as a closing tag (when present) and not with a number in between like yours <per_proc>1</per_proc>
I found an other config_aux.xml example:

<?xml version="1.0" ?>
<config>
    <max_jobs_in_progress>
        <app>
            <app_name>xansons_gpu</app_name>
            <gpu_limit>
                <jobs>50</jobs>
                <per_proc/>
            </gpu_limit>
        </app>
        <app>
            <app_name>xansons_cpu</app_name>
            <cpu_limit>
                <jobs>25</jobs>
            </cpu_limit>
        </app>
    </max_jobs_in_progress>
</config>
ID: 40531 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,088,159
RAC: 104,046
Message 40532 - Posted: 19 Nov 2019, 16:41:26 UTC
Last modified: 19 Nov 2019, 17:37:58 UTC

Have now first -native Theory(300.02).
Prefs: Tasks=4 and Cpu=1, use Test-Application=no, Theory native= no
Got only one task instead of two before with old Version (1.01).
Edit: In the Project-Stats this new -native(300.02) are shown under Theory-Tasks.
ID: 40532 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 40535 - Posted: 19 Nov 2019, 18:56:56 UTC

Minty has been getting vbox jobs all day but still only one at a time. I even tried allowing "native if available" and the Native app itself but it still refuses to download any more - "No tasks available for ..."
But allowing Sixtrack gets one task to use the idle core and 2 spares so the host is fully occupied once more although perhaps not as expected.
ID: 40535 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40538 - Posted: 19 Nov 2019, 20:02:03 UTC - in response to Message 40531.  


It looks like is only used as a closing tag (when present) and not with a number in between like yours 1

The parse bool function suggests both will result to true. I tried the self-closing tag first before making it more explicit. This assignment also suggests it is working from the the log output I was getting:
[quota] Limits for Theory:
[quota] CPU: base 1 scaled 7 njobs 0

The base and scaled values seem correct for my host with 4 ncpus. Need to find out why njobs isn't as expected. It could be something trivial such as total_limit not being defined so defaulting to 1.
ID: 40538 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40543 - Posted: 19 Nov 2019, 21:35:29 UTC - in response to Message 40538.  

Need to find out why njobs isn't as expected. It could be something trivial such as total_limit not being defined so defaulting to 1.

I think njobs is the number of tasks being returned so probably not what we are looking for. total_limit is now set to 10. Let's see how far we get and if we can understand what is going on.
ID: 40543 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,088,159
RAC: 104,046
Message 40545 - Posted: 20 Nov 2019, 1:44:35 UTC

Found this thread, is it this pref:
<max_wus_in_progress> N </max_wus_in_progress>
https://boinc.berkeley.edu/forum_thread.php?id=12588
ID: 40545 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40546 - Posted: 20 Nov 2019, 5:51:58 UTC - in response to Message 40545.  

Found this thread, is it this pref:
N
https://boinc.berkeley.edu/forum_thread.php?id=12588


This is in the config.xml and is for the whole project. It is currently set to 50.
ID: 40546 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40549 - Posted: 20 Nov 2019, 8:40:10 UTC - in response to Message 40543.  

I think njobs is the number of tasks being returned so probably not what we are looking for. total_limit is now set to 10. Let's see how far we get and if we can understand what is going on.

The Max # CPUs is limiting the Max # of jobs.
Now max CPUs must be equal or higher than Max jobs, else you don't get the number of tasks you want or your buffer can hold.

E.g. Max # jobs 3
Max # CPUs 2

I only get 2 tasks. When I set Max # CPUs to 3 or higher, I get 3 tasks.

The Max # CPU's should have no influence on the number of tasks.

Btw: For Theory I would remove the Max # of CPUs and run only single core Theory-tasks.
At the moment higher number of cpus is only useful for the ATLAS-application.
ID: 40549 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 40554 - Posted: 20 Nov 2019, 14:24:13 UTC - in response to Message 40549.  

Btw: For Theory I would remove the Max # of CPUs and run only single core Theory-tasks.
At the moment higher number of cpus is only useful for the ATLAS-application.

That explains why BOINCTasks is showing only 50% CPU usage for the native Theory. I was wondering about that.
ID: 40554 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40564 - Posted: 21 Nov 2019, 8:53:49 UTC - in response to Message 40549.  
Last modified: 21 Nov 2019, 8:54:17 UTC

The Max # CPUs is limiting the Max # of jobs.
Now max CPUs must be equal or higher than Max jobs, else you don't get the number of tasks you want or your buffer can hold.

E.g. Max # jobs 3
Max # CPUs 2

I only get 2 tasks. When I set Max # CPUs to 3 or higher, I get 3 tasks.

The Max # CPU's should have no influence on the number of tasks.

Btw: For Theory I would remove the Max # of CPUs and run only single core Theory-tasks.
At the moment higher number of cpus is only useful for the ATLAS-application.

I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains:
    <min_ncpus>1</min_ncpus>
    <max_threads>2</max_threads>

As far as I understand the Theory app does use two threads but is there any advantage of giving two CPUs?
ID: 40564 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,911,012
RAC: 138,148
Message 40566 - Posted: 21 Nov 2019, 10:04:01 UTC - in response to Message 40564.  

The following pstree command shows a boinc client running 3 Theory native tasks:
boinc─┬─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe
      │                 │                      │                   ├─runRivet.sh───sleep
      │                 │                      │                   ├─rungen.sh───pythia8.exe
      │                 │                      │                   └─sleep
      │                 │                      └─7*[{runc}]
      │                 └─{wrapper_2019_03}
      ├─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh─┬─rivetvm.exe
      │                 │                      │                   ├─runRivet.sh───sleep
      │                 │                      │                   ├─rungen.sh───pythia8.exe
      │                 │                      │                   └─sleep
      │                 │                      └─10*[{runc}]
      │                 └─{wrapper_2019_03}
      ├─wrapper_2019_03─┬─cranky-0.0.29───runc─┬─job───runRivet.sh───xargs───complete.sh
      │                 │                      └─7*[{runc}]
      │                 └─{wrapper_2019_03}
      └─{boinc}


Going deeper in the tree shows that there are far more than 2 processes per task (threads is not correct in this context):
cranky-0.0.29(6085)───runc(7754)─┬─job(7795)───runRivet.sh(7978)─┬─rivetvm.exe(17190)
                                 │                               ├─runRivet.sh(17191)───sleep(2926)
                                 │                               ├─rungen.sh(17189)───pythia8.exe(18501)
                                 │                               └─sleep(7472)
                                 ├─{runc}(7776)
                                 ├─{runc}(7777)
                                 ├─{runc}(7778)
                                 ├─{runc}(7779)
                                 ├─{runc}(7783)
                                 ├─{runc}(7784)
                                 └─{runc}(7786)


On a standard linux system the kernel scheduler takes care of all processes (not only boinc/cranky) and assigns each a fair amount of CPU resources.
Guess this example runs on a >1 core cpu far away from being fully loaded, then the 2 main processes runRivet.sh(17191) and pythia8.exe(18501) will run on different cores with the result that the counted CPU time will rise faster than walltime.

This can be controlled via cgroups where it would be possible to e.g. limit the whole process tree to 1 single core, or even a fraction of a core.

A vbox app implicitly does the same when a VM is set up as n core VM.
In this case the hypervisor tells the kernel to limit the CPU usage of all processes inside the VM to n cores.


From the BOINC client's perspective Theory native is designed to be a singlecore app and BOINC doesn't care whether the app launches just a single process or many of them.


BOINC parameters like #cores are used to tell the BOINC client how much work can be fetched and how many tasks can run concurrently.
They sometimes can be used to tell the BOINC client a bit about the internal app structure, e.g. ATLAS where we have real threads.
But there is no direct interface between such an app and the BOINC client.
You may remember that sometimes we had misconfigured ATLAS tasks that set up 8 athena threads on a singlecore VM.
ID: 40566 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40570 - Posted: 21 Nov 2019, 10:49:57 UTC - in response to Message 40566.  
Last modified: 21 Nov 2019, 10:50:08 UTC

I have removed the multi-threading values from the plan class. It should now always runs as single CPU.
ID: 40570 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40577 - Posted: 21 Nov 2019, 13:25:14 UTC - in response to Message 40564.  

I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains:
    <min_ncpus>1</min_ncpus>
    <max_threads>2</max_threads>

As far as I understand the Theory app does use two threads but is there any advantage of giving two CPUs?

This is how limits are now working when requesting Theory's:

Max 1 task / thread
Max # of CPUs
Max # of jobs.

Since the tasks will run single core it's best to set 'No limit' for Max # of CPU's to avoid getting less tasks than you expect.
If you want less tasks than the number of threads set that lower value to Max # of tasks.
ID: 40577 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 40578 - Posted: 21 Nov 2019, 13:38:57 UTC - in response to Message 40577.  

I set Max jobs = 2 and Max CPUs 2 and ended up with two jobs each using 2 CPUs. Not sure this is what we want. The plan class currently contains:
    <min_ncpus>1</min_ncpus>
    <max_threads>2</max_threads>

As far as I understand the Theory app does use two threads but is there any advantage of giving two CPUs?

This is how limits are now working when requesting Theory's:

Max 1 task / thread
Max # of CPUs
Max # of jobs.

Since the tasks will run single core it's best to set 'No limit' for Max # of CPU's to avoid getting less tasks than you expect.
If you want less tasks than the number of threads set that lower value to Max # of tasks.

This will affect all VBox apps. I will investigate how to disable Max # of CPUs for single threaded apps.
ID: 40578 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,088,159
RAC: 104,046
Message 40579 - Posted: 21 Nov 2019, 23:36:23 UTC - in response to Message 40524.  

We reduced the number of tasks downloadable as some bad hosts were pulling 50 or so jobs at a time and failing them all. The current limit is one task per ncpu. I don't know why you are not getting two tasks if you have two cores.

This is such a fubor host:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10283981&offset=0&show_names=0&state=0&appid=13
ID: 40579 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Theory Application : Move TheoryN Back Into Theory.


©2024 CERN