log in

Feedback for multicore theory wus


Advanced search

Message boards : Theory Application : Feedback for multicore theory wus

Author Message
PHILIPPE
Send message
Joined: 24 Jul 16
Posts: 65
Credit: 128,142
RAC: 434
Message 30260 - Posted: 9 May 2017, 17:12:53 UTC

Usually , a volunteer can choose his way of crunching following 3 main criterias :
1°) credits earned
2°) cpu efficiency --> reduce idle time between jobs
3°) ram optimization --> adjust the behaviour of computer (smoothy or not responsible)

Folowing the computezrmle's advice ,
i decided to watch the behaviour of multi-core for the theory work units.
I wanted to know how it was possible to run them on this bedtest :
a small host ( 4 cpus ,less than 3.7 GBytes Ram memory, windows 10 ).
(100 % use cpu max , 90 % use memory max)

No other projects present.

I used this linear formula :
Ram memory VM in MBytes = 370 + 260 * n , where n is the number of cores used in the vm.

1-core : 630 MB ram
2-core : 890 MB ram
3-core : 1150 MB ram
4-core : 1410 MB ram


The setting of the ram memory is more difficult than expected because the lower value of ram free ,
inside the vm ,is not just after the beginning of the job ,
but often when the process "cc1plus" appears for few seconds at the beginning or at the end of the jobs.

4 cases :

4-core wu: Not possible because i need to open my browser during the tests.
in such condition ,one of the 4 jobs inside the vm failed.
Orphean processes appeared , either pythia process without rivetvm , either rivetvm without pythia.
The reliability of the 4-core is lower than the single core.
A service launched in background by windows or other apps would have probably produced the same effect.
(but maybe on linux , it would work , if daemons are under control).
Netherveless , ram memory was enough during the test of some jobs during a partial run of the wu.
Ram improvement / core : (630 - 1410/4)/630 = 44 %
Cpu efficiency : not calculated


3-core wu: possible , its interest is to enable normal use of my computer ( 3 cpus busy , 1 idle).
I can browse without any trouble.
Ram improvement / core : (630 - 1150/3)/630 = 39 %
Cpu efficiency :

Test 1 : Task 137020841 :
78618 / 3 / 27274 = 96 %
Test 2 : Task 137102831 :
70807 / 3 / 28141 = 84 %
Test 3 : Task 137178516 :
93493 / 3 / 33485 = 93 %


2-core wu: possible and more resilient, i used the 4 cores simultaneously (2 * 2-core wu) and browsing without any trouble
Ram improvement / core : (630 - 890/2)/630 = 29 %
Cpu efficiency :

Test 1 : Task 137300541 :
45767 / 2 / 29559 = 77 %
Test 2 : Task 137300507 :
72010 / 2 / 47434 = 76 %
Test 3 : Task 137474119
83597 / 2 / 47268 = 88 %
Test 4 : Task 137497053
62842 / 2 / 35075 = 89 %

1-core wu:by default
Ram improvement / core : 0%
Cpu efficiency :

Test 1 : Task 137928981 :
16810 / 1 / 24077 = 70 %
Test 2 : Task 137923944 :
41168 / 1 / 49179 = 84 %
Test 3 : Task 137915263 :
39054 / 1 / 50821 = 77 %
Test 4 : Task
37420 / 1 / 40763 = 92 %
----------------------------------------------------------------------------------------------------------------------------------------------------------------
4 * 1-core wus at the same times , and i can't browse.
3 * 1-core wus at the same times , and it's possible.
Why ?
4 * 630 = 2520 MB ram used (out of 90 % * 3.7 GB ram = 3 330 MB ram)--> 3330-2520= 810 MB ram for OS
3 * 630 = 1890 MB ram used (out of 90 % * 3.7 GB ram = 3 330 MB ram)--> 3330-1890=1440 MB ram for OS and browser.
In windows task manager , i see firefox needs 200 MB ram , so i can only spend (3 330 - 810 - 200 = ) 2320 MB ram for boinc projects.
that is to say only 2320 / 4 = 580 MB by cpu if it is shared equally.

To conclude :
If i want to browse during running theory wus with a maximum of jobs, it's better to use
3 * 1-core with little and medium browsing (630 * 3 = 1890 MB ram used),
2 * 2-core with little ans slow browsing (890 * 2 = 1780 MB ram used),
1 * 3-core with big and quick browsing (1460 * 1 = 1460 MB ram used).
And if i avoid to browse , the best choice is 2 * 2-core.

Hope it may help someone having the same hardware configuration and troubles using his computer.
But each host is different , each user , too...
You have to play with the 2 parameters max#cpu and max#job in your web preference
and modify the app_config.xml file according to the parameters chosen.

These choices are only for small hosts running only theory wus which have the slower ram footprint , next sixtrack.
With other sub-projects , i can run only 1-core LHCb or only 1-core CMS but no Atlas Tasks.

By the way , credits are given in relation with cpu times for multicores , not with elapsed times * number of core used.
So you earn a bit less credits than with 1-core .(It depends on cpu efficiency).
(To evaluate cpu efficiency ,
for elapsed times , i look deep inside the logs to take the hour of the first event, and the hour of the last event written, less period shutdown times,
for cpu times , i get the value in the last phrase where appears cpu times inside the logs.) Values reported in task list are slightly different.

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,501,142
RAC: 1,838
Message 30262 - Posted: 9 May 2017, 18:33:43 UTC - in response to Message 30260.

Well done!

It can be very annoying to run long-lasting tests on a nearly saturated host.
Your numbers will help other users to optimise their setup.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32587 - Posted: 2 Oct 2017, 15:15:10 UTC

How did you setup Theory for multicore?
Have you tried it with LHCb or CMS?

The preferences setting for max cores of 8 have no effect on any of my Theory/CMS or LHCb WU's (I remember Theory being single core WU's while running vLHC).
ATLAS d/l 8 cores properly.

I was trying to develop an app_config.xml to adjust multi-core Theory but it failed.
Did your method require changes within the Guest OS configuration?

(Apologies if I missed a thread covering this but I did spend 20 minutes looking for such a thread.)

PHILIPPE
Send message
Joined: 24 Jul 16
Posts: 65
Credit: 128,142
RAC: 434
Message 32588 - Posted: 2 Oct 2017, 17:18:07 UTC - in response to Message 32587.

Hi Marmot, these settings were adapted to small computers , like mine .

If you have a stronger configuration , you can use higher values, of course.

It 's possible to run theory tasks with lower ram footprint than using single cores.In fact it's also possible to run single core task with less amount of ram than the 630 MB announced.But it's not a good way to run properly the wus (more jobs with endless loopings unuseful for the project).

I tried to do the same experience for LHCb wus but i understood it 's much more complicated and Laurence explained why :message.

The specification for LHC HEP applications is 2GB per core. This value is used to build the internal computing infrastructure. The VMs also have 1GB of swap configured. For the theory application, others have done similar tests and we arrived a sensible value for the memory. We have to be careful that the observations are true for all the jobs that LHCb may wish to run. The 2250Mb was originally requested by LHCb.


This is the hep software foundation which decides the roadmap to follow in order to work in the same way.Other message

Your app_config.xml will overwrite the web setting in your web preferences.

Here is an example : to allow one 4-cores theory task or 4 sixtract tasks or only one LHCB or CMS or ATLAS single core tasks.

I don't see your computer.It 's hidden...

<app_config>

<project_max_concurrent>4</project_max_concurrent>

<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>CMS</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>LHCb</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>sixtrack</name>
<max_concurrent>4</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>Theory</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 3400</cmdline>
</app_version>
<app_version>
<app_name>CMS</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 2048</cmdline>
</app_version>
<app_version>
<app_name>LHCb</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 2048</cmdline>
</app_version>
<app_version>
<app_name>Theory</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>4.000000</avg_ncpus>
<cmdline>--nthreads 4.000000</cmdline>
<cmdline>--memory_size_mb 1410</cmdline>
</app_version>

</app_config>


It's up to you to adapt at your needs.
By the way , record it as an xml file with the notepad software and not as a text file.It' s important.The result is not the same...

The app_config.xml file takes effects only before a wu is downloaded in your boinc client.So you have to wait that your previous wus downloaded finish before.

I don't change anything in the OS configuration except to enable virtualization in the bios settings.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32595 - Posted: 3 Oct 2017, 6:43:42 UTC - in response to Message 32588.
Last modified: 3 Oct 2017, 7:21:07 UTC

Thanks for the input.

I've been editing app_config.xml files for years.

The vbox64 plan class was missing from my app_config attempt and is why it failed.

<app_name>Theory</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>4.000000</avg_ncpus>
<cmdline>--nthreads 4.000000</cmdline>
<cmdline>--memory_size_mb 1410</cmdline>
</app_version>


The multi-core control is not very intuitive.
The preferences need to be updated with explicit control options.

Crystal Pellet
Volunteer moderator
Volunteer tester
Send message
Joined: 14 Jan 10
Posts: 384
Credit: 2,997,020
RAC: 1,945
Message 32598 - Posted: 3 Oct 2017, 7:26:52 UTC

@PHILIPPE:

If you want to get the most out of that machine and also be able to browse with it

- Run 4 Theory single core tasks concurrently
- Set in app_config.xml the memory size at 384MB (even 256MB worked with me, but more swapping)
- Set priority of VBoxHeadless.exe lower than normal (read Priorities)
- When machine is still getting sluggish, set in BOINC's computing preferences the % of CPU-time at 90%.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32600 - Posted: 3 Oct 2017, 7:31:12 UTC
Last modified: 3 Oct 2017, 8:25:59 UTC

For base benchmarking, I'll run 8 single core WU at a time and get a data set of 20 WU per project class.
The WU selection will be made in the account preferences (refusing WU with app_config <max_concurrent>0</max_concurrent> doesn't work).


Later I'll try running 2, 4 and 8 core WU's for all plans for 20 WU data sets.

ATLAS defaulted to 8 core and has 8 jobs already in queue so will have to wait till tomorrow.

Thanks for the clue as to why my app_config failed.
The laptop with only 8GB RAM will be able to run some Theory and still handle browsing for videos. (Chrome, along with safety feature extensions, has been expanding it's footprint and will now use up 3GB of RAM with 12 tabs open).

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32601 - Posted: 3 Oct 2017, 7:35:08 UTC - in response to Message 32598.
Last modified: 3 Oct 2017, 8:07:08 UTC

@PHILIPPE:

If you want to get the most out of that machine and also be able to browse with it

- Run 4 Theory single core tasks concurrently
- Set in app_config.xml the memory size at 384MB (even 256MB worked with me, but more swapping)
- Set priority of VBoxHeadless.exe lower than normal (read Priorities)
- When machine is still getting sluggish, set in BOINC's computing preferences the % of CPU-time at 90%.


I've had great success by setting the priority of browser software to 'above normal' with a process manager like Process Hacker which keeps a database of priority levels for any process you adjust and returns it to that priority level whenever the process is detected in RAM.
This could also solve the problem of putting VBoxSVC.exe to low priority anytime it starts. Process Hacker remembers the priority setting and adjusts it automatically. Outertech Cacheman is another app that remembers priority of processes and auto adjusts.
Maybe you know of other apps which serve the same purpose. MS hasn't implemented that feature into it's own task manager.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32708 - Posted: 9 Oct 2017, 5:27:36 UTC - in response to Message 32588.
Last modified: 9 Oct 2017, 5:29:10 UTC


If you have a stronger configuration , you can use higher values, of course.

<app_config>

<project_max_concurrent>8</project_max_concurrent>

<app>
<name>ATLAS</name>
<max_concurrent>8</max_concurrent>
<fraction_done_exact/>
</app>

<app_version>
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 3400</cmdline>
</app_version>
</app_config>


This configuration won't get more than 3 ATLAS on a 32GB RAM computer.

BOINC seems to be receiving incorrect information about the RAM used by the single core ATLAS.
Maximum ATLAS that will start on 32GB RAM is 3 when BOINC is told to use as much as 98% of RAM (both in-use and not settings) and 90% of SWAP.


Attempting 8 WU on CMS also is running into problems.

I'll keep you updated on my tests.

Erich56
Send message
Joined: 18 Dec 15
Posts: 383
Credit: 3,872,621
RAC: 7,608
Message 32709 - Posted: 9 Oct 2017, 6:05:10 UTC - in response to Message 32708.

Attempting 8 WU on CMS also is running into problems.

on my 32GB machine, 8 WUs CMS run very well (besides 2 WUs GPUGRID).

Crystal Pellet
Volunteer moderator
Volunteer tester
Send message
Joined: 14 Jan 10
Posts: 384
Credit: 2,997,020
RAC: 1,945
Message 32711 - Posted: 9 Oct 2017, 8:45:59 UTC - in response to Message 32708.

This configuration won't get more than 3 ATLAS on a 32GB RAM computer

For the amount of RAM needed for 1 task, BOINC is not calculating with the values in your app_config, but what information is coming from your preferences.

If you have set Max # CPUs to 8, BOINC will reserve 9800MB (2600+8*900) RAM for 1 task.
That could be the reason, why you can only run 3 tasks on your 32GB machine.
If you want to run single core VM's, set also # CPUs to 1 in your preferences.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32811 - Posted: 12 Oct 2017, 23:38:30 UTC - in response to Message 32709.
Last modified: 12 Oct 2017, 23:58:26 UTC

Attempting 8 WU on CMS also is running into problems.

on my 32GB machine, 8 WUs CMS run very well (besides 2 WUs GPUGRID).



It was a configuration issue on the machine.
LSI RAID controller only supported RAID 10 or 0 so I tried out MS Storage Spaces for a OS managed RAID 5. Benchmarked very nicely, but under full load, attempting to load up 8 CMS VM's, the OS couldn't keep up it's parity stripe writes.

Reconfigured to hardware LSI RAID 0 and things are working way better.
Will try again on the 8x CMS later.

Also, the max CPU's for the project was set to 24.
It doesn't say max CPU per job/WU but max CPU for the project.

marmot
Send message
Joined: 5 Nov 15
Posts: 74
Credit: 2,521,326
RAC: 40,668
Message 32812 - Posted: 12 Oct 2017, 23:56:59 UTC - in response to Message 32711.

This configuration won't get more than 3 ATLAS on a 32GB RAM computer

For the amount of RAM needed for 1 task, BOINC is not calculating with the values in your app_config, but what information is coming from your preferences.

If you have set Max # CPUs to 8, BOINC will reserve 9800MB (2600+8*900) RAM for 1 task.
That could be the reason, why you can only run 3 tasks on your 32GB machine.
If you want to run single core VM's, set also # CPUs to 1 in your preferences.



Thanks.
The preferences in the account settings reads for setting a limit on cores used by the entire project.
It's set to 24.... so the preferences are trying to setup RAM for a 24 core VM!

The setting says:
"Max # of CPUs for this project"
Which means total cores used for all jobs/work units being run for the project.

If that setting is to control CPU's per WU then it should read:
"Max # of CPUs per job/work unit"

Someone else had mentioned that the preferences seemed to be confusing on that point.

Message boards : Theory Application : Feedback for multicore theory wus