Message boards : Number crunching : Vbox error that I have never seen before
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36119 - Posted: 30 Jul 2018, 3:29:46 UTC - in response to Message 36118.  

What I was not understanding in the other posts was that the files were not already in existence.
I could tell you didn't understand that and that's why I told you in this thread that the file does not exist and that you have to create it.

So if the information is already in other posts, then save me some grief and just point out the links in order.
Save me and all the others trying to help you some grief and read the links provided. Our time is just as valuable as yours.

The threads jump around a bit.
Indeed they do but it's all there in the links provided to you. Read them instead of demanding "Just tell me the steps" without even a "please". Talk about being insulted...

My set up is not the normal set up either.
If you would get off your firebrand and read what I said, my setup is split on two drives after suffering a SSD failure. So programs files are on the SSD and data files are on the external.
If you had bothered to read the linked material you would have realized it doesn't matter how many drives you have or how your stuff is spread out. The procedure(s) in the links gets around that problem.

And then within the the LHC program files folder, I did find something that related to ATLAS, but the text did not match what the linked information provided.
Then ask a direct question about that discrepancy instead of demanding, without even a please or a thank you, to have it all spoon fed to you.

Look, if you don't want to help, then don't get your panties in a wad. Just say you don't want to help. Maybe someone else has the patience to explain, rather than get all volcano like. GEES...get a freaking grip.
Look... if you can't read and have the courage to say "I didn't understand that" or "I read it but it doesn't add up for me, there seems to be a discrepancy between what you said I should see and what I actually see, please explain again" then you can't be helped. So hey... my panties aren't in a wad and I suggest to you that if you can't ask pointed questions about stuff you don't understand then maybe it's your panties that are in a wad. Hmmm?
ID: 36119 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36120 - Posted: 30 Jul 2018, 6:40:24 UTC - in response to Message 36119.  

ok ok...will take a look a bit later.
I don't mean to be rude or demanding. What I wrote did not seem demanding.
I was quick reading at the time.
I'll go back and try and digest this again in a few hours.

It's just your meh and related comments came across as rude. So rude met rude.
Again, i'll look a bit later on and see if I can make sense of it all.
ID: 36120 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36127 - Posted: 30 Jul 2018, 11:49:21 UTC - in response to Message 36096.  
Last modified: 30 Jul 2018, 12:04:59 UTC

... 1-core setup is the most efficient one, followed by a 2-core setup ...

Right.

This is a special case as greg_be runs an 8-core setup on a host with only 16GB RAM.
Each 1-core/2-core setup would need 4800MB per core, thus only 3/6 cores could be used in total.

Let's see if 1 8-core or 2 4-core VMs deliver valid results.



If I need 4800 per core and have only 16 gig then it appears I am limited to 3 cores. 4 cores would put me over the memory limit if I understand this correctly.
I made the app_config file.
But Nthreads vs CPU's which one is giving me the cores?
I left threads at 2 and changed CPU to 3.
Not sure if that is right or not.
ID: 36127 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36128 - Posted: 30 Jul 2018, 14:54:09 UTC - in response to Message 36127.  

If I need 4800 per core and have only 16 gig then it appears I am limited to 3 cores. 4 cores would put me over the memory limit if I understand this correctly.

If you consider only physical RAM then yes, 4 cores would put you over the 16 GB limit. However there is always virtual RAM available too. The problem is that virtual RAM is disk based and much slower but if the system is not too loaded with other processes then you might be able to get away with 4 cores, maybe even more. If the system is overloaded then it starts moving stuff back and forth between physical RAM and virtual RAM. If that back and forth movement is excessive then things go to hell quickly. A little movement is OK but there is a limit. The bottom line is that others can make reasonable estimates of the max # of cores you can get away with but only you know how busy your system actually is. You can try to run it at the max or you can take a safer more conservative approach, it's all up to you.

Remember though that you can't rely on the feedback you get on your results page. Even if the tasks are validating they might not be returning HITS files which means no useful work done. So go ahead and experiment with 4, 5, 7, or even 8 cores but keep an eye on the HITS files and the panda reports discussed in that other thread.

Personally I would start with 2 or 3 cores. If you can get 20 results with valid HITS then maybe try 4 cores. 8 is out of the question, IMHO, but maybe the suggestion to try 8 was made to illustrate a point via a failed experiment.

I made the app_config file.
But Nthreads vs CPU's which one is giving me the cores?
Not sure what you mean by terms Nthreads and CPU's. I think you mean <avg_ncpus>x</avg_ncpus> from the app_config.xml and the "Max # CPUs" setting on the website preferences. If so then the conventional wisdom, AFAIK, is to set those the same.
ID: 36128 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36129 - Posted: 30 Jul 2018, 15:22:02 UTC - in response to Message 36128.  

Well the way you guys talk about the amount of memory consumed per task, I'll just leave it at 3 and be happy if works. Right now on another project I am spitting errors out by the dozens with something to do with checkpoints. So I don't want to get any errors here if I can help it other than design flaws in the code.

So here is the other thing I was talking about.
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>2.0</avg_ncpus>
<cmdline>--nthreads 2 --memory_size_mb 4800</cmdline>


In ncpus I changed that to 3 and in nthreads I left that at 2.
I am not sure which does what regarding cores.
ID: 36129 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36131 - Posted: 30 Jul 2018, 15:41:51 UTC - in response to Message 36129.  

My bad. I forgot about that --nthreads. Now I see what you mean. I don't know exactly what those settings do and https://boinc.berkeley.edu/wiki/Client_configuration (scroll to the bottom of the page) doesn't say much about --nthreads other than that it's a commandline parameter passed to the client. I guess the next step would be to look in the wiki for command line parameters for a better explanation but no time right now.

All I can tell you is I left those the same and it works for me. You'll know when you download some ATLAS tasks because in BOINC manager entries in the Status column on the Tasks page will show the number of cores associated with each task.
ID: 36131 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36133 - Posted: 30 Jul 2018, 16:17:51 UTC - in response to Message 36131.  

Right. I've seen that on other tasks (x cpu's or 1 cpu and 1 gpu)
The nthreads example they give goes to 7?!?
But they don't explain nthreads.
Guess that is something I will have to play around with when ATLAS comes back.

Something else:
Does THEORY have the same issues as ATLAS or does it run like Sixtrack?
I have never really watched what each group does on my machine.
ID: 36133 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36134 - Posted: 30 Jul 2018, 16:28:56 UTC - in response to Message 36131.  
Last modified: 30 Jul 2018, 16:29:36 UTC

My bad. I forgot about that --nthreads. Now I see what you mean. I don't know exactly what those settings do and https://boinc.berkeley.edu/wiki/Client_configuration (scroll to the bottom of the page) doesn't say much about --nthreads other than that it's a commandline parameter passed to the client. I guess the next step would be to look in the wiki for command line parameters for a better explanation but no time right now.

All I can tell you is I left those the same and it works for me. You'll know when you download some ATLAS tasks because in BOINC manager entries in the Status column on the Tasks page will show the number of cores associated with each task.


http://boinc.berkeley.edu/trac/wiki/VboxApps

Vboxwrapper command-line options -- nthreads N
Create a virtual machine that will use N cores.


So 3 cpu's that use 2 cores each in my case?
Something else from Yeti on this subject: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4137#29016
I haven't read it in depth yet as its dinner time...but what do you make of it?
At a quick glance he assigns more memory and there are no threads and he uses 4 cpu's.
ID: 36134 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36135 - Posted: 30 Jul 2018, 17:00:56 UTC - in response to Message 36133.  

Does THEORY have the same issues as ATLAS or does it run like Sixtrack?
I have never really watched what each group does on my machine.
Of all the LHC apps Sixtrack requires the least memory and other than ATLAS native it's the only one that doesn't require VirtualBox. Theory, LHCb, CMS and ATLAS all require VBox. Theory and LHCb need considerably less memory than ATLAS, about half IIRC. Admins have posted info about each app's memory needs.

The main issue with ATLAS is not that it takes so much memory, it's that the server doesn't calculate the memory requirement properly. It under-estimates the memory requirement which has the effect of allowing too many tasks to run for the amount of memory the host has. That is why you need the app_config.xml. It corrects what the server gets wrong. For Theory and LHCb the server calculates the memory requirement properly so no app_config.xml required for those.
ID: 36135 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36138 - Posted: 30 Jul 2018, 17:27:06 UTC - in response to Message 36135.  

Ok thanks for the info.
Now just need that nthreads thing figured out.
Time to go back to reading some threads.

Does THEORY have the same issues as ATLAS or does it run like Sixtrack?
I have never really watched what each group does on my machine.
Of all the LHC apps Sixtrack requires the least memory and other than ATLAS native it's the only one that doesn't require VirtualBox. Theory, LHCb, CMS and ATLAS all require VBox. Theory and LHCb need considerably less memory than ATLAS, about half IIRC. Admins have posted info about each app's memory needs.

The main issue with ATLAS is not that it takes so much memory, it's that the server doesn't calculate the memory requirement properly. It under-estimates the memory requirement which has the effect of allowing too many tasks to run for the amount of memory the host has. That is why you need the app_config.xml. It corrects what the server gets wrong. For Theory and LHCb the server calculates the memory requirement properly so no app_config.xml required for those.
ID: 36138 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36140 - Posted: 30 Jul 2018, 17:40:45 UTC - in response to Message 36092.  
Last modified: 30 Jul 2018, 17:43:15 UTC

This computer should be able to run 2 ATLAS VMs concurrently using a 4-core setup.
A 4-core setup is much more efficient than an 8-core setup.

In case of Theory multicore should run as efficient as singlecore but there's on big risk:
If you get a longrunner like sherpa close to the end of the 12h design limit it will use only 1 core (+1 core used by auxiliary apps).
The rest of your cores will remain idle for up to 6h.

Suggestion:
To be most efficient you may use a 4-core setup for ATLAS and a 1-core or 2-core setup for Theory.


Do you have to add another section for Theory in addition to the one created for ATLAS in the app_config file?

I saw in another thread some discussion you were part of, some stuff about nthreads. But I can not get any clear picture from BOINC or the discussions in ATLAS what nthreads is all about. I saw something about it limits the number of tasks I think it was (see another post in this thread with a comment about that).

Also based on the 4800MB memory needs, I can only run 3 ATLAS tasks on the physical memory I have. Not sure how virtual or disk memory would supplement the physical memory or cause problems if used to run 4 cpu's. That is another question.
ID: 36140 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,147
RAC: 2,155
Message 36141 - Posted: 30 Jul 2018, 17:48:21 UTC - in response to Message 36140.  

..
Do you have to add another section for Theory in addition to the one created for ATLAS in the app_config file?
..

Did not test all applications recently, but Theory is not obeying the app_config.xml.
The memory setting for Theory comes from the server: 730MB for 1 core and each core more +100MB.
This is my app_config.xml.


<app_config>
<project_max_concurrent>8</project_max_concurrent>
 <app>
  <name>ALICE</name>
  <max_concurrent>1</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>ATLAS</name>
  <max_concurrent>1</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>Benchmark</name>
  <max_concurrent>1</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>CMS</name>
  <max_concurrent>4</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>LHCb</name>
  <max_concurrent>4</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>sixtrack</name>
  <max_concurrent>8</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>sixtracktest</name>
  <max_concurrent>8</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app>
  <name>Theory</name>
  <max_concurrent>7</max_concurrent>
  <fraction_done_exact/>
 </app>
 <app_version>
  <app_name>ALICE</app_name>
  <plan_class>vbox64</plan_class>
  <avg_ncpus>1.000000</avg_ncpus>
  <cmdline>--nthreads 1.000000</cmdline>
  <cmdline>--memory_size_mb 630</cmdline>
 </app_version>
 <app_version>
  <app_name>ATLAS</app_name>
  <plan_class>vbox64_mt_mcore_atlas</plan_class>
  <avg_ncpus>4.000000</avg_ncpus>
  <cmdline>--memory_size_mb 6500</cmdline>
 </app_version>
 <app_version>
  <app_name>Benchmark</app_name>
  <plan_class>vbox64</plan_class>
  <avg_ncpus>1.000000</avg_ncpus>
  <cmdline>--nthreads 1.000000</cmdline>
  <cmdline>--memory_size_mb 1408</cmdline>
 </app_version>
 <app_version>
  <app_name>CMS</app_name>
  <plan_class>vbox64</plan_class>
  <avg_ncpus>1.000000</avg_ncpus>
  <cmdline>--memory_size_mb 1896</cmdline>
 </app_version>
 <app_version>
  <app_name>Theory</app_name>
  <plan_class>vbox64</plan_class>
  <avg_ncpus>1.000000</avg_ncpus>
  <cmdline>--memory_size_mb 630</cmdline>
 </app_version>
</app_config>
ID: 36141 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36143 - Posted: 30 Jul 2018, 18:38:15 UTC - in response to Message 36141.  

..
Do you have to add another section for Theory in addition to the one created for ATLAS in the app_config file?
..

Did not test all applications recently, but Theory is not obeying the app_config.xml.
The memory setting for Theory comes from the server: 730MB for 1 core and each core more +100MB.
This is my app_config.xml.


I noticed you have nthreads in your code. What is nthreads and how do they affect the performance?
---------
If the rest of the applications in LHC behave normally, then I will leave them alone.
-----------------
Should memory be increased to 6500 or 7500 from 4800? I see Yeti set his stuff at 7500.
What is the average that ATLAS uses ? or max I guess I should say.
ID: 36143 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,147
RAC: 2,155
Message 36144 - Posted: 30 Jul 2018, 19:57:52 UTC - in response to Message 36143.  

I have no nthreads in the ATLAS part of the code, only avg_ncpus.
nthreads is normaly used by CPU-tasks like e.g. PrimeGrid for multi-core cpu-tasks.
The avg_ncpus is telling VBox with how many CPU's the VM should be created.
The default memory setting for ATLAS is 3500, 4400, 5300, 6200 for 1-core-, 2 core-, 3 core- and 4-core VM's.
As you can see the formula is 2600 + 900 for every core.
It appeared that mostly the 4400 for a dual core is too low, so increase that value,
however it's possible that certain ATLAS-batches are maybe demanding more RAM.
When you have installed VIrtualBox Extension Pack, you can use Show VM Console in BOINC Manager
to see the 'top' command (ALT-F3) and how many memory keeps free (or not) during the run.
ID: 36144 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,505,811
RAC: 125,141
Message 36145 - Posted: 30 Jul 2018, 19:58:20 UTC - in response to Message 36143.  

"<avg_ncpus>" affects the reports and calculations done by your BOINC client.
"nthreads" sets the number of cores used by your virtual machine.
Log entry: "Setting CPU Count for VM. (4)"
Simply keep them in sync.

Your web preferences (#cores) should be set to the highest core number you use as it also affects the default RAM calculation.
It's 4 in Crystal Pellet's example (ATLAS).
This will also be the default for "nthreads".

Other multicore apps (currently only Theory) use their own default settings but it is based on the web setting.
If you want to run non-default setups for a distinct app, you'll have to use an app_config section for that app.


ATLAS needs plenty of RAM only at the beginning of each task to expand the initial parameter sets.
Those RAM requirements change from time to time when the scientists configure fresh task series.
In addition ATLAS was originally designed to run much larger tasks (1000 Events instead of 200) in a datacenter environment, not in a BOINC environment.
Thus the ATLAS default RAM formula does not always work for 1-cores and 2-cores.


A good starting point would be to focus on only 1 app - ATLAS or Theory or LHCb - until you get familiar with that app.
Also start with a low #cores and a low number of concurrently running tasks.


Last but not least it could be a problem that you use an external disk.
You'll have to try out how many tasks your system can run concurrently until the external interface becomes a bottleneck.
ID: 36145 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,203,902
RAC: 3,788
Message 36146 - Posted: 30 Jul 2018, 20:07:37 UTC - in response to Message 36145.  

ok..thanks.
I will reread everything in the morning EU time.
Can't really focus on this complex stuff tonight.

Very good information guys, thanks!
ID: 36146 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Vbox error that I have never seen before


©2024 CERN