Message boards : ATLAS application : Problems with BOINC:Virtualbox (Bad page map in process ...)
Message board moderation

To post messages, you must log in.

AuthorMessage
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37841 - Posted: 28 Jan 2019, 10:01:56 UTC
Last modified: 28 Jan 2019, 10:08:45 UTC

Hi everybody, i have an unusual problem to run ATLAS applications in BOINC. I got a Scientific Linux 7, but i try with Ubuntu 18 and i got the same problem. I follow the Yeti checklist to install and configure BOINC with Virtualbox, but i can't solve the problem.

The problem is within the Virtualbox. My tasks apparently run, but when i look at the Virtualbox machine, i see that:



It's says that i have a BUG: Bad page map in process ...

The result of this is an computing error after 6-7 days, and i never complete one task. Could someone tell me how to fix it?

Thank you!

(Excuse me if my english is not very well)
ID: 37841 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,909,718
RAC: 138,050
Message 37842 - Posted: 28 Jan 2019, 12:43:51 UTC - in response to Message 37841.  

You may try the following steps:


1. Set all BOINC projects to "no new tasks".
2. Set all currently NOT running ATLAS tasks to "pause".
3. Cancel all currently running ATLAS tasks and wait a minute until the VMs are shut down.
4. Gracefully shutdown your BOINC client.
5. Check your VirtualBox manager and remove all suspect VMs.
6. Check the folder "/var/lib/boinc-client/slots/". If you only run ATLAS, it should be empty.
7- Check if your BOINC client account has access rights to the slots folder
8. Set your project preferences to "max #CPUs = 2"
9. Create an app_config.xml in "your_boinc-client_basic_folder/projects/lhcathome.cern.ch_lhcathome/"
<app_config>
  <app_version>
    <app_name>ATLAS</app_name>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <avg_ncpus>2.0</avg_ncpus>
    <cmdline>--memory_size_mb 4800</cmdline>
  </app_version>
  <project_max_concurrent>2</project_max_concurrent>
</app_config>

10. Restart your host
11. Resume all paused tasks
12. Allow your client to request new tasks


If this doesn't work you may check your firewall for closed ports (see: http://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use) and/or do a project reset to get a fresh *.vdi file.
ID: 37842 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37848 - Posted: 29 Jan 2019, 9:12:21 UTC - in response to Message 37842.  
Last modified: 29 Jan 2019, 9:13:13 UTC

I do all steps that you recommend me, but i got the same error. The firewall is disable, and i got VirtualBox Extension pack installed and VT-x enable. The message is the same:



So i don't know what is the matter. Any more ideas?

Thank you for your help!
ID: 37848 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,909,718
RAC: 138,050
Message 37849 - Posted: 29 Jan 2019, 13:05:05 UTC - in response to Message 37848.  

Some rare comments you can find with google point out that the error may be a result of bad physical RAM.
Could you check this with memtest?
ID: 37849 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37851 - Posted: 29 Jan 2019, 14:38:49 UTC

I had some similar happenings. The only but working solution was to run Atlas-WUs as Single-Core and this worked fine


Supporting BOINC, a great concept !
ID: 37851 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37858 - Posted: 30 Jan 2019, 8:44:55 UTC - in response to Message 37849.  

Some rare comments you can find with google point out that the error may be a result of bad physical RAM.
Could you check this with memtest?


Yes, i ran memtest for a one day and everything was right. When i looked the problem in google, i only found a bad RAM or bad kernel, but i think that it's is impossible, because i try with a three different PC's with the same hardware and all have the same problem.
ID: 37858 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37859 - Posted: 30 Jan 2019, 8:54:25 UTC - in response to Message 37851.  

I had some similar happenings. The only but working solution was to run Atlas-WUs as Single-Core and this worked fine


And.. How can i change the boinc configuration to run Atlas-WUs as Single-Core? I think it's done through Project Preferences --> Max # CPUs, but i'm not sure, is it correct?
ID: 37859 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37865 - Posted: 30 Jan 2019, 13:37:45 UTC - in response to Message 37859.  

And.. How can i change the boinc configuration to run Atlas-WUs as Single-Core? I think it's done through Project Preferences --> Max # CPUs, but i'm not sure, is it correct?

That's correct. Max # CPUs is used to create a VM with # cores and also used by BOINC to reserve the needed memory.
ID: 37865 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37888 - Posted: 1 Feb 2019, 9:28:58 UTC - in response to Message 37865.  

Maybe is not the only way. I found other possibility with app_config.xml file. If you put a <cmdline> --nthreats 1</cmdline> you'll have the same result. But if you have enough RAM memory to run 2 tasks, you can run both tasks at the same time.

But i don't understand why my single-core tasks consume more than 2,1 GB of memory RAM, as indicated in Yeti's Checklist (Each SingeCore-Atlas-Task needs 2,1 GB free RAM, MultiCore-WUs need 3,0 GB + 0,9 GB * number of cores (Last Update from 01.08.2018) So 7,5 GB for a 5-Core WU.).

Any ideas?

Thank you all!
ID: 37888 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,909,718
RAC: 138,050
Message 37890 - Posted: 1 Feb 2019, 10:09:40 UTC - in response to Message 37888.  

To change the web preferences is the easiest way to get a good setup as it also adjusts the RAM calculation.
The server sends the correct values within the scheduler_reply.
This method is recommended if you run only 1 subproject, e.g. ATLAS.

#cpus and RAM settings from the server can be overruled by a local app_config.xml.
If you use "<avg_ncpus>x</avg_ncpus>", "<cmdline>--nthreads ..." should be obsolete.

BTW:
It's "nthreads" instead of "nthreats"

But i don't understand why my single-core tasks consume more than 2,1 GB of memory RAM, as indicated in Yeti's Checklist ...

It was necessary to change the RAM fromula a couple of times during the last years.
Yeti also posted the recent one:
Yeti wrote:
... MultiCore-WUs need 3,0 GB + 0,9 GB * number of cores
ID: 37890 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37893 - Posted: 1 Feb 2019, 10:20:07 UTC - in response to Message 37888.  

But i don't understand why my single-core tasks consume more than 2,1 GB of memory RAM, as indicated in Yeti's Checklist (Each SingeCore-Atlas-Task needs 2,1 GB free RAM, MultiCore-WUs need 3,0 GB + 0,9 GB * number of cores (Last Update from 01.08.2018) So 7,5 GB for a 5-Core WU.).

Aahh, time has overtaken this.

In former times there has been a separat app for SIngleCore-WUs, these needed 2,1 GB. But this app has been cancelled.

When you run Atlas with 1-Core, than it is still a MulticoreWU, only reduced to 1-Core. This will leed, as the Fomula calculates, to 3,9 GB for 1-Cor WU

I will try to find a better wording in the Checklist


Supporting BOINC, a great concept !
ID: 37893 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 37894 - Posted: 1 Feb 2019, 10:21:58 UTC - in response to Message 37893.  

HM, I had already made a note about this:

[Update 18.09.2018] Nowerdays Atlas runs only MultiCoreWUs, even if you run it with 1-Core only it will need up to 3,9 GB as SingleCore



Supporting BOINC, a great concept !
ID: 37894 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37896 - Posted: 1 Feb 2019, 11:56:36 UTC - in response to Message 37890.  

To change the web preferences is the easiest way to get a good setup as it also adjusts the RAM calculation.
The server sends the correct values within the scheduler_reply.
This method is recommended if you run only 1 subproject, e.g. ATLAS.

#cpus and RAM settings from the server can be overruled by a local app_config.xml.
If you use "<avg_ncpus>x</avg_ncpus>", "<cmdline>--nthreads ..." should be obsolete.

BTW:
It's "nthreads" instead of "nthreats"

But i don't understand why my single-core tasks consume more than 2,1 GB of memory RAM, as indicated in Yeti's Checklist ...

It was necessary to change the RAM fromula a couple of times during the last years.
Yeti also posted the recent one:
Yeti wrote:
... MultiCore-WUs need 3,0 GB + 0,9 GB * number of cores


Ahhh, now i understand. Sorry for my bad english, i wanted to say threads not threats. So now i know that my computer don't have Hyper-Threading Technology and i need to put --nthreads flag to work fine. But i think that <avg_ncpus>x<avg_ncpus> is different that <cmdline>--ntheads</cmdline>, because the cores of CPU can run multiple threads, if you have a Hyper-Threading Technology in your computer. But it's my opinion.

Thank you for all!
ID: 37896 · Report as offensive     Reply Quote
UAM-LCG2

Send message
Joined: 30 Oct 18
Posts: 16
Credit: 192,743,156
RAC: 3
Message 37897 - Posted: 1 Feb 2019, 12:03:59 UTC - in response to Message 37894.  

HM, I had already made a note about this:

[Update 18.09.2018] Nowerdays Atlas runs only MultiCoreWUs, even if you run it with 1-Core only it will need up to 3,9 GB as SingleCore


Okey, so now all it's fine. i thought that this formula was currently valid. Thank you for time!
ID: 37897 · Report as offensive     Reply Quote

Message boards : ATLAS application : Problems with BOINC:Virtualbox (Bad page map in process ...)


©2024 CERN