1) Message boards : Number crunching : web preferences - queue tasks but run only 1 (Message 45223)
Posted 18 Aug 2021 by wolfman1360
Post:
BOINC never worked that way.
The client always tries to use all available resources except if a limit is set by the user.

Just a simple example with project A and project B, both CPU projects.
The project settings on the servers A an B allow you to set a relative weight (default 100).

Your client ensures that long term the runtime usage 1) is given 100/100 to each project.
In case project A doesn't send out work for a while your computer will run only project B.
Once project A starts sending work again your client will automatically prefer project A for a while.

The local work buffer is filled based on expected runtime, not based on number of tasks.


If you plan to spend a fix number of cores, e.g. 4, to only 1 (sub)project (ATLAS) the best solution would be to set up an additional client that is configured to run nothing but ATLAS.


1) simply spoken.
In reality it's much more complex.



UGGGH!!!! (Frustration) (Beating head on desk)
I give up! If its more complex then I just face the reality of the inability of BOINC to work in a way that fits what I want to do.
So then its 4 cores on a single task and roll the dice on when ATLAS will be able to run again. Crazy.
I moved LHC to 200 and WCG to 80 to try and push more towards ATLAS.

Best of luck.
That resource setting, by the way, will take about 3 weeks to learn what you want.
I'm still figuring out the best configuration for this project - it by far requires the most messing about with various settings to optimize runtimes vs. available memory and available cores. I think I have it now.
2) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45221)
Posted 18 Aug 2021 by wolfman1360
Post:
Hi, are you running SixTrack, virtually no I/O? Thanks. Eric

I do have it selected, but there don't seem to be any tasks available currently.
thanks
3) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45201)
Posted 13 Aug 2021 by wolfman1360
Post:
CMS - not tweaked? How can one tweak them and what will result?

You can tweak some parameters using an app_config.xml.
See this page for details:
https://boinc.berkeley.edu/wiki/Client_configuration

Your own app_config.xml must strictly follow the template shown there.
Default values can be found in client_state.xml

Mostly used for LHC@home tweaking:
<max_concurrent>n</max_concurrent>
<project_max_concurrent>N</project_max_concurrent>
<avg_ncpus>x</avg_ncpus> # 1)

The VM's RAM size can be tweaked using
<cmdline>--memory_size_mb 2048</cmdline> # 2)


1) The manual explains: "...(possibly fractional) ..." but this makes no sense here since it also tells vboxwrapper how many cores it should configure for the VM. The latter only accepts integer values, hence use "x" or "x.0".

2) 2048 is the default for CMS and doesn't need to be specified here.
Setting it higher would be waste of RAM since a VM never returns allocated RAM to the OS.
Setting it a bit lower will slow down the VM.
Setting it much lower will cause the scientific app not to run since it checks if enough RAM is available.

Thank you. Will give this a look. Think I will need to as I may need more than 3 preference types.
4) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45195)
Posted 12 Aug 2021 by wolfman1360
Post:
native Theory:
usually 600-800 MB per task.
BUT! occasionally there will be special tasks (madgraph) that allocate >6.5 GB! plus a 2nd core.

CMS
If not tweaked each task will set up a 2 GB VM.
+ some MB for vboxwrapper

CMS - not tweaked? How can one tweak them and what will result?
thanks
5) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45193)
Posted 12 Aug 2021 by wolfman1360
Post:
Thank you for all of that. I'll have to play with settings and maybe app config to figure out the best solution. Maybe 2 12 core WUs from Atlas and the rest for CMS and theory.
How much ram does native theory and CMS use?
6) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45190)
Posted 12 Aug 2021 by wolfman1360
Post:
I don't get many stuck VMs anymore.
Great to hear!

ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly.
10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores?

I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads.

A squid proxy will reduce the load on the CERN servers and your internet usage.

I use 90% of total thread to give some breathing room, for OS overhead running 42 CMS at once on a 48 thread system this however is 96 GB of ram usage and 100% CPU load.

e.g. on one computer now, I have 14 ATLAS, 17 CMS and 5 theory this is using 98% CPU and 156 GB of memory.
Thank you for those figures. How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing.
7) Message boards : Number crunching : Beating the Heat (Message 45188)
Posted 12 Aug 2021 by wolfman1360
Post:
I have twelve machines, spaced to provide heating in the winter.
In the summer, I turn off all except five in the basement, where I do not need to air condition them.

There is not much point in spending money both for running the machines and then for the cost of removing the heat by air conditioning. That is a waste of energy for me.

I feel the same.
The summer is where I generally figure out what machines to retire as well. In this case Nehalem EP, Bloomfield, some Sandy bridge. Electrical costs can almost make up for the work these new Ryzens do.
8) Message boards : Number crunching : Beating the Heat (Message 45186)
Posted 12 Aug 2021 by wolfman1360
Post:
I have most of my machines in the basement, or have rented a few dedicated servers. It makes up for the lack of electricity I'd ultimately spend with the same setup , at this point. I should really throw a few cores at climate prediction given what's going on in the world.

It's been over 30 c here most days since the beginning of June. We've had maybe 2 inches of rain during that time.

Best of luck.
9) Message boards : ATLAS application : Creation of container failed (Message 45185)
Posted 12 Aug 2021 by wolfman1360
Post:
You are quite welcome. Nothing is easy with native, especially singularity.
It all depends on which version of Linux you have, and what libraries it contains.

I am glad it worked this time.

As am I. I usually install the standard Ubuntu 20.04 along with any updates. I know I had major issues with debien.
10) Message boards : Number crunching : best practices/how to get most efficiency for higher core machines (Message 45183)
Posted 11 Aug 2021 by wolfman1360
Post:
Hello
I've been out of the loop on this project for a while and decided to give it another go. I have nearly 200 cores to add, so just want to make sure I'm not going to bungle this with a lot of errors or stuck vms.
I've got several 24 to 48 thread Xeons with anywhere from 32-64 gb ram, as well as a few 8 thread i7's with 16-32, all running Linux.
Squid will be used, but my biggest concern is ram and stuck vms.
I know Theory takes the least ram per WU - but for best efficiency, what would folks recommend for managing Atlas and CMS workunits? Can Boinc be trusted to manage ram on its own? Still trying to figure out number of workunits vs. number of CPUs, which I believe the latter only has to do with Atlas.
Most of these, apart from a few, are running cheap ssds since I figure a lot of disk activity will be going on and with a lot of WUs crunching at the same time that might be a factor in how fast they start/stop, especially in regards to CMS. I'm not sure what the process is for each one to start and finish.
Right now the goal is to add machines very slowly, making sure each one can crunch Atlas, Theory, and CMS with one WU sent of each before moving onto the next.
Examples of processors and ram configurations - e5-2670v3 with 64 gb, e5-2680 with 32.
If I remember right CMS and Atlas are the biggest users of bandwidth and disk?
thanks and any help appreciated!
11) Message boards : ATLAS application : Creation of container failed (Message 45182)
Posted 11 Aug 2021 by wolfman1360
Post:
sudo apt-get install singularity

Even though it says "Singularity works", I haven't had that version of singularity work in years, on Ubuntu 20.04.2 and earlier.

This is what worked for me about six months ago, though it is a fairly safe bet that things have changed by now.
But it might get you started. (First uninstall the version you have, with sudo apt remove singularity, though it probably won't find it anyway).

First: install Dependencies:
sudo apt-get update && sudo apt-get install -y \
build-essential \
libssl-dev \
uuid-dev \
libgpgme11-dev \
squashfs-tools \
libseccomp-dev \
wget \
pkg-config \
git \
cryptsetup

To correct broken packages (examples):
sudo apt install libseccomp2=2.4.3-1ubuntu1
sudo apt install libssl1.1=1.1.1f-1ubuntu2

Second: install GO
sudo snap install go --classic
Result: go 1.16.2 from Michael Hudson-Doyle (mwhudson) installed

go get -u github.com/golang/dep/cmd/dep
(wait for the download!)

Check version: go version

-----------------------------------------------------------------------------------
Install Singularity (check latest version at https://github.com/hpcng/singularity/releases):
See: https://sylabs.io/guides/3.7/user-guide/quick_start.html

go get -d github.com/sylabs/singularity

export VERSION=3.7.2 && \
wget https://github.com/hpcng/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
tar -xzf singularity-${VERSION}.tar.gz && \
cd singularity

./mconfig && \
make -C ./builddir && \
sudo make -C ./builddir install

Check Version: singularity --version

First off - thank you so much. We're currently well above 10 minutes of CPU time, so it appears to be working. This was all very much above my head and I simply copy and pasted a lot of this with the substitutions of version numbers and directories. Hopefully nothing goes wrong in the future as I am still very much new to Linux for the most part.
hopefully this, along with a few other things, can be pinned. I'm finding a lot of things are very scattered - especially as they relate to native - and there are a lot of different instructions that are now out of date or do not work for this project.
For reference, this is what just worked for me. I still have a few questions, but that's for another topic.
sudo apt remove singularity
sudo apt autoremove (not sure if this is needed, but it didn't break anything, yet)
sudo apt-get update && sudo apt-get install -y \
build-essential \
libssl-dev \
uuid-dev \
libgpgme11-dev \
squashfs-tools \
libseccomp-dev \
wget \
pkg-config \
git \
cryptsetup

To correct broken packages (examples):
sudo apt install libseccomp2=2.4.3-1ubuntu1
sudo apt install libssl1.1=1.1.1f-1ubuntu2

Second: install GO
sudo snap install go --classic
Result: go 1.16.6 from Michael Hudson-Doyle (mwhudson) installed
Check version: go version
go get -d github.com/hpcng/singularity
export VERSION=3.8.0 && \
wget https://github.com/hpcng/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
tar -xzf singularity-${VERSION}.tar.gz && \
cd singularity-3.8.0

./mconfig && \
make -C ./builddir && \
sudo make -C ./builddir install

Good luck and thank you again.
12) Message boards : ATLAS application : Creation of container failed (Message 45180)
Posted 11 Aug 2021 by wolfman1360
Post:
Having the same issue here. it has been a while.
Steps ran:
Installed cvmfs.
create default.local
sudo apt-get install singularity
sudo apt-get install squashfs-tools
Ubuntu 20.04. Plenty of ram - set to run 1 Atlas task, machine has 16 gb ram. Initially thought there was an error while changing number of CPUs per task.
Task in question:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=323595231

Has anyone found a working fix for this?
13) Message boards : ATLAS application : A few errors, tried everything I can (Message 44457)
Posted 7 Mar 2021 by wolfman1360
Post:
Thank you. I did sudo apt-get install singularity, so we'll see what happens.

I think you get an old version that way, though I have not tried it recently. It might work.

You can get the latetest version by following these instructions:
https://sylabs.io/guides/3.0/user-guide/installation.html
Note that you have to install GO first.

The following works for me on Ubuntu 18.04/20.04.
Not all versions of GO and Singularity work for me, but these do.
You can find the latest versions as indicated.

Install GO (check latest version at https://pkg.go.dev/golang.org/dl#pkg-subdirectories):

export VERSION=1.15.2 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc

go get -u github.com/golang/dep/cmd/dep


-----------------------------------------------------------------------------------

Install Singularity (check latest version at https://github.com/hpcng/singularity/releases):

export VERSION=3.6.4 && # adjust this as necessary \
    mkdir -p $GOPATH/src/github.com/sylabs && \
    cd $GOPATH/src/github.com/sylabs && \
    wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
    tar -xzf singularity-${VERSION}.tar.gz && \
    cd ./singularity && \
    ./mconfig

./mconfig && \
    make -C ./builddir && \
    sudo make -C ./builddir install


Check Version: singularity --version

It seems as though most of this is well above my head and knowledge. Or maybe it's simply paid work followed by 'free work and troubleshooting' as it were.
Things still don't appear to be functioning correctly. I do not appear to be out of memory.
Have switched over to Rosetta, for now, until I can troubleshoot this and get it functional. Will start with one machine at a time.
I may reinstall Ubuntu 20.04 from scratch and go through everything again. At this point I'm not sure what else to try. The most frustrating part - I have not done anything different on the machines that are able to complete these tasks versus the ones that are not, to my knowledge, so I'm at a loss as to where to begin.
14) Message boards : ATLAS application : A few errors, tried everything I can (Message 44449)
Posted 5 Mar 2021 by wolfman1360
Post:
Thank you. I did sudo apt-get install singularity, so we'll see what happens.

I think you get an old version that way, though I have not tried it recently. It might work.

You can get the latetest version by following these instructions:
https://sylabs.io/guides/3.0/user-guide/installation.html
Note that you have to install GO first.

The following works for me on Ubuntu 18.04/20.04.
Not all versions of GO and Singularity work for me, but these do.
You can find the latest versions as indicated.

Install GO (check latest version at https://pkg.go.dev/golang.org/dl#pkg-subdirectories):

export VERSION=1.15.2 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc

go get -u github.com/golang/dep/cmd/dep


-----------------------------------------------------------------------------------

Install Singularity (check latest version at https://github.com/hpcng/singularity/releases):

export VERSION=3.6.4 && # adjust this as necessary \
    mkdir -p $GOPATH/src/github.com/sylabs && \
    cd $GOPATH/src/github.com/sylabs && \
    wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
    tar -xzf singularity-${VERSION}.tar.gz && \
    cd ./singularity && \
    ./mconfig

./mconfig && \
    make -C ./builddir && \
    sudo make -C ./builddir install


Check Version: singularity --version

Thanks so much. I'll give this a try if the above method fails.
Bought two year old but new Ryzen 5 3600's. They'll be a marked improvement over what I have currently, especially in the summer.
15) Message boards : ATLAS application : A few errors, tried everything I can (Message 44437)
Posted 5 Mar 2021 by wolfman1360
Post:
have the same info from CentOS8.
After installing singularity from OS - Tasks are running well.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5613

Thank you. I did sudo apt-get install singularity, so we'll see what happens.
thanks
16) Message boards : ATLAS application : A few errors, tried everything I can (Message 44432)
Posted 4 Mar 2021 by wolfman1360
Post:
Hi,
A few errors on multiple machines, Ubuntu 20.04.
[2021-03-04 08:18:44] FATAL:  while extracting /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img: root filesystem extraction failed: extract command failed: exit status 1
https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481909
And this one.
[2021-03-04 13:04:23] FATAL:  container creation failed: mount ->/var error: can't remount /var: operation not permitted
https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481865
No idea how to go about fixing these.
17) Message boards : Theory Application : Read-only file system error on native Theory (Message 44423)
Posted 1 Mar 2021 by wolfman1360
Post:
Getting tasks like this one.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=303571058
No clue what to make of this.
Debien 10 on an i7-4770.
Have ran the following as outlined.
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p
Still getting user namespace errors nonetheless, as well as the above.
Recommendations appreciated.
18) Message boards : Number crunching : Optimal CPU usage? (Message 44419)
Posted 28 Feb 2021 by wolfman1360
Post:
One final question because this is confusing me a little.
I've got a few Xeon e5's with 32-40 threads, but only 32 GB of ram.
I'm reading on the forum folks using app_configs to limit CPU and concurrent tasks. Concurrent tasks for subproject makes sense, currently you can only limit the number of actual jobs in progress and I'm assuming this is inclusive of all subprojects selected under preferences, , but is there any reason to limit CPU via this method as well as on the website?
Am I to assume the project is smart enough - if 6 4-core Atlas tasks ran on an e5-2680 that it would populate the rest of the machine with single threaded tasks, provided enough memory was available?
thanks
19) Message boards : Number crunching : Optimal CPU usage? (Message 44415)
Posted 27 Feb 2021 by wolfman1360
Post:
CMS tasks take about 2500 MB memory if I remember correctly. But they always use only 1 CPU core so not much you can do there. Their run time (if everything is working properly) is 12...18 hours. They run so that when 12 hours is full, they look for point where the currently running job is finished and finish the task there. If the task is not finished at 18 hours they will be terminated by Boinc.

Thank you, that's a great help.
Will boinc automatically insure that ram doesn't get overused and wait for memory to free up before resuming tasks, or should I create an app_config file?
20) Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions (Message 44414)
Posted 27 Feb 2021 by wolfman1360
Post:
Perhaps that file could be updated with the minimum needed configuration ...

The file on the server is already up to date.
Be aware that it includes 2 optional settings (with proxy/without proxy) and one of them has to be activated by the user.

In general:
Native apps require more settings to be done by the user.
This is easier, faster and more reliable than to guess certain values.
In addition some steps require to be done by root.



although I'm still unclear how one can actually optimize their configuration

The simple answer

Cache as much as possible as close as possible to the point were it is used.
To avoid less efficient effort focus on the major bottlenecks first.


More LHC@home specific

CVMFS is heavily used but has it's own cache - one cache instance per machine.
A machine can't share it's CVMFS cache with other machines.
Each VM counts as individual machine.
Outdated or missing data is requested from the project servers.

Frontier is heavily used by ATLAS and CMS. It has no own local cache.
Each app sends all Frontier requests to the project servers.


Cloudflare's openhtc.io infrastructure helps to distribute CVMFS and Frontier data.
They run a very fast worldwide network and one of their proxy caches will most likely be located much closer to your clients than any project server.

VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration.
This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF".


A local HTTP proxy closes the gap between openhtc.io and the local clients.
It can cache data for all local CVMFS and Frontier clients as well as offload openhtc.io and the project servers.

How does one go about cacheing as much as possible?
Not sure what happened in my case, then, since as soon as I downloaded
https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local
I got immediate failures after probing.
Running the listed items in the how to fixed my issues, and I believe I also added the line containing openhtc.io.

Thank you for the help and excellent clarification.


Next 20


©2024 CERN