1) Message boards : ATLAS application : A few errors, tried everything I can (Message 44457)
Posted 7 Mar 2021 by wolfman1360
Post:
Thank you. I did sudo apt-get install singularity, so we'll see what happens.

I think you get an old version that way, though I have not tried it recently. It might work.

You can get the latetest version by following these instructions:
https://sylabs.io/guides/3.0/user-guide/installation.html
Note that you have to install GO first.

The following works for me on Ubuntu 18.04/20.04.
Not all versions of GO and Singularity work for me, but these do.
You can find the latest versions as indicated.

Install GO (check latest version at https://pkg.go.dev/golang.org/dl#pkg-subdirectories):

export VERSION=1.15.2 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc

go get -u github.com/golang/dep/cmd/dep


-----------------------------------------------------------------------------------

Install Singularity (check latest version at https://github.com/hpcng/singularity/releases):

export VERSION=3.6.4 && # adjust this as necessary \
    mkdir -p $GOPATH/src/github.com/sylabs && \
    cd $GOPATH/src/github.com/sylabs && \
    wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
    tar -xzf singularity-${VERSION}.tar.gz && \
    cd ./singularity && \
    ./mconfig

./mconfig && \
    make -C ./builddir && \
    sudo make -C ./builddir install


Check Version: singularity --version

It seems as though most of this is well above my head and knowledge. Or maybe it's simply paid work followed by 'free work and troubleshooting' as it were.
Things still don't appear to be functioning correctly. I do not appear to be out of memory.
Have switched over to Rosetta, for now, until I can troubleshoot this and get it functional. Will start with one machine at a time.
I may reinstall Ubuntu 20.04 from scratch and go through everything again. At this point I'm not sure what else to try. The most frustrating part - I have not done anything different on the machines that are able to complete these tasks versus the ones that are not, to my knowledge, so I'm at a loss as to where to begin.
2) Message boards : ATLAS application : A few errors, tried everything I can (Message 44449)
Posted 5 Mar 2021 by wolfman1360
Post:
Thank you. I did sudo apt-get install singularity, so we'll see what happens.

I think you get an old version that way, though I have not tried it recently. It might work.

You can get the latetest version by following these instructions:
https://sylabs.io/guides/3.0/user-guide/installation.html
Note that you have to install GO first.

The following works for me on Ubuntu 18.04/20.04.
Not all versions of GO and Singularity work for me, but these do.
You can find the latest versions as indicated.

Install GO (check latest version at https://pkg.go.dev/golang.org/dl#pkg-subdirectories):

export VERSION=1.15.2 OS=linux ARCH=amd64 && \
    wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
    sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
    rm go$VERSION.$OS-$ARCH.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
    echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
    source ~/.bashrc

go get -u github.com/golang/dep/cmd/dep


-----------------------------------------------------------------------------------

Install Singularity (check latest version at https://github.com/hpcng/singularity/releases):

export VERSION=3.6.4 && # adjust this as necessary \
    mkdir -p $GOPATH/src/github.com/sylabs && \
    cd $GOPATH/src/github.com/sylabs && \
    wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
    tar -xzf singularity-${VERSION}.tar.gz && \
    cd ./singularity && \
    ./mconfig

./mconfig && \
    make -C ./builddir && \
    sudo make -C ./builddir install


Check Version: singularity --version

Thanks so much. I'll give this a try if the above method fails.
Bought two year old but new Ryzen 5 3600's. They'll be a marked improvement over what I have currently, especially in the summer.
3) Message boards : ATLAS application : A few errors, tried everything I can (Message 44437)
Posted 5 Mar 2021 by wolfman1360
Post:
have the same info from CentOS8.
After installing singularity from OS - Tasks are running well.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5613

Thank you. I did sudo apt-get install singularity, so we'll see what happens.
thanks
4) Message boards : ATLAS application : A few errors, tried everything I can (Message 44432)
Posted 4 Mar 2021 by wolfman1360
Post:
Hi,
A few errors on multiple machines, Ubuntu 20.04.
[2021-03-04 08:18:44] FATAL:  while extracting /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img: root filesystem extraction failed: extract command failed: exit status 1
https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481909
And this one.
[2021-03-04 13:04:23] FATAL:  container creation failed: mount ->/var error: can't remount /var: operation not permitted
https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481865
No idea how to go about fixing these.
5) Message boards : Theory Application : Read-only file system error on native Theory (Message 44423)
Posted 1 Mar 2021 by wolfman1360
Post:
Getting tasks like this one.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=303571058
No clue what to make of this.
Debien 10 on an i7-4770.
Have ran the following as outlined.
sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p
Still getting user namespace errors nonetheless, as well as the above.
Recommendations appreciated.
6) Message boards : Number crunching : Optimal CPU usage? (Message 44419)
Posted 28 Feb 2021 by wolfman1360
Post:
One final question because this is confusing me a little.
I've got a few Xeon e5's with 32-40 threads, but only 32 GB of ram.
I'm reading on the forum folks using app_configs to limit CPU and concurrent tasks. Concurrent tasks for subproject makes sense, currently you can only limit the number of actual jobs in progress and I'm assuming this is inclusive of all subprojects selected under preferences, , but is there any reason to limit CPU via this method as well as on the website?
Am I to assume the project is smart enough - if 6 4-core Atlas tasks ran on an e5-2680 that it would populate the rest of the machine with single threaded tasks, provided enough memory was available?
thanks
7) Message boards : Number crunching : Optimal CPU usage? (Message 44415)
Posted 27 Feb 2021 by wolfman1360
Post:
CMS tasks take about 2500 MB memory if I remember correctly. But they always use only 1 CPU core so not much you can do there. Their run time (if everything is working properly) is 12...18 hours. They run so that when 12 hours is full, they look for point where the currently running job is finished and finish the task there. If the task is not finished at 18 hours they will be terminated by Boinc.

Thank you, that's a great help.
Will boinc automatically insure that ram doesn't get overused and wait for memory to free up before resuming tasks, or should I create an app_config file?
8) Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions (Message 44414)
Posted 27 Feb 2021 by wolfman1360
Post:
Perhaps that file could be updated with the minimum needed configuration ...

The file on the server is already up to date.
Be aware that it includes 2 optional settings (with proxy/without proxy) and one of them has to be activated by the user.

In general:
Native apps require more settings to be done by the user.
This is easier, faster and more reliable than to guess certain values.
In addition some steps require to be done by root.



although I'm still unclear how one can actually optimize their configuration

The simple answer

Cache as much as possible as close as possible to the point were it is used.
To avoid less efficient effort focus on the major bottlenecks first.


More LHC@home specific

CVMFS is heavily used but has it's own cache - one cache instance per machine.
A machine can't share it's CVMFS cache with other machines.
Each VM counts as individual machine.
Outdated or missing data is requested from the project servers.

Frontier is heavily used by ATLAS and CMS. It has no own local cache.
Each app sends all Frontier requests to the project servers.


Cloudflare's openhtc.io infrastructure helps to distribute CVMFS and Frontier data.
They run a very fast worldwide network and one of their proxy caches will most likely be located much closer to your clients than any project server.

VBox apps use openhtc.io by default but users running native apps have to set "CVMFS_USE_CDN=yes" in their CVMFS configuration.
This is disabled in the default configuration because lots of computers in various datacenters use special connections and require this to be set "OFF".


A local HTTP proxy closes the gap between openhtc.io and the local clients.
It can cache data for all local CVMFS and Frontier clients as well as offload openhtc.io and the project servers.

How does one go about cacheing as much as possible?
Not sure what happened in my case, then, since as soon as I downloaded
https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local
I got immediate failures after probing.
Running the listed items in the how to fixed my issues, and I believe I also added the line containing openhtc.io.

Thank you for the help and excellent clarification.
9) Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions (Message 44404)
Posted 27 Feb 2021 by wolfman1360
Post:
Very nice; thanks.
But I think it should be pointed out that the automatic configuration download no longer applies, insofar as I can see.
(sudo wget https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local)

Maybe it could be updated?

I had this problem, as well. Probing immediately failed.
Perhaps that file could be updated with the minimum needed configuration, although I'm still unclear how one can actually optimize their configuration if it is just 1 or 2 machines on the same connection.
10) Message boards : Number crunching : Optimal CPU usage? (Message 44403)
Posted 27 Feb 2021 by wolfman1360
Post:
Hello,
Back crunching here a little after quite a long while, starting things off slow.
Currently have an duel Opteron 6128 and i7-3610qm.
They seem to be receiving a lot of Atlas tasks, which are multithreaded. I do not recall if there are others that are multi threaded as well - perhaps CMS?
In people's experience, what is more efficient? Less cores thus long runtimes or more cores and shorter runtimes, per task? I see these Atlas tasks run about 2.5 gb of ram per WU. is it just a matter of not oversaturating the ram, as it were?
Right now I am running 4 threads per task, as per project settings and max number of CPUs.

Welcome back.

Atlas is currently the only multi threaded application here. The most efficient usage of CPUs would be running with a single thread as the multi thread tasks have at the beginning and at the end of a task a section that uses only a single thread anyway. So other CPU threads reserved for that task are idling during that time. Also multiple threads are doing their job calculations independently from an other so they will not finish at the same time when all jobs have been done.

But CPU threads are not the only thing to worry about, there is also the memory consumption. The amount of memory an Atlas task uses is calculated like this: 3000 MB + n * 900 MB where n is number of threads. So a single CPU Atlas task uses 3900 MB and two CPU task uses 4800 MB. So if you have the cores available but are limited by the memory more threads are the way to go.

If you are keen on the credits you get then single CPU tasks earn more because credit is calculated from the run time and not the CPU time.

Good luck with your experiments.

Hi,
Yes, this Opteron only has 32 GB of memory, so a single task on a 16 core machine wouldn't go over so well.
I take it I should maybe try and slow down on the CMS tasks as well? I recall them taking a decent chunk of memory, too.
thanks
11) Message boards : Number crunching : Optimal CPU usage? (Message 44400)
Posted 26 Feb 2021 by wolfman1360
Post:
Hello,
Back crunching here a little after quite a long while, starting things off slow.
Currently have an duel Opteron 6128 and i7-3610qm.
They seem to be receiving a lot of Atlas tasks, which are multithreaded. I do not recall if there are others that are multi threaded as well - perhaps CMS?
In people's experience, what is more efficient? Less cores thus long runtimes or more cores and shorter runtimes, per task? I see these Atlas tasks run about 2.5 gb of ram per WU. is it just a matter of not oversaturating the ram, as it were?
Right now I am running 4 threads per task, as per project settings and max number of CPUs.
12) Message boards : ATLAS application : extraction failed: could not extract squashfs data, unsquashfs not found (Message 44399)
Posted 26 Feb 2021 by wolfman1360
Post:
The fallback proxies are configured by a script that is located on CERN's online repository cvmfs-config.cern.ch.
This is done because your defaul.local defines a squid that can't be accessed or does not even exist.
It looks like you simply copied the default.local file from the forum thread and did not read the comments inside.

There are 2 possible solutions:
1. To setup a local proxy and replace "squid" with it's hostname
2. To remove the "#" in front of CVMFS_HTTP_PROXY="auto;DIRECT"

(1.) would be the preferred method for clusters/single computers providing more than 5 worker nodes
(2.) is the simple solution
The limit of 5 should be seen as a magnitude rather than a sharp limit.


Don't forget an "[sudo] cvmfs_config reload" after you saved the changes.

<edit>
Sorry, checked the wrong logfile.
Your recent ones show that you are already running a proxy called "squid".
That's fine. Leave it this way.
[2021-02-25 18:47:47] 2.8.0.0 2129 1201 59336 79744 0 78 1907592 4096000 1399 130560 0 386235 98.3608 875279 454 http://s1fnal-cvmfs.openhtc.io/cvmfs/atlas.cern.ch http://squid:3128 0

</edit>

Both of these machines are running at different locations, so are single, and I do not believe require a local proxy.
The Opterons should really be retired soon. They are currently my space heater, but 2.0 ghz and over 130 plus w each is a little intense on the electric bill when there are Ryzen 3's that can outdo it these days.
13) Message boards : ATLAS application : extraction failed: could not extract squashfs data, unsquashfs not found (Message 44393)
Posted 26 Feb 2021 by wolfman1360
Post:
Any help appreciated.

Good to see that you got it running.

Nonetheless you may check your CVMFS setup.
It connects via a fallback proxy at Fermilab (131.225.188.245).
To get out why be so kind as to post the output of the following command:
cvmfs_config showconfig atlas.cern.ch |grep -E 'FALLBACK_PROXY|HTTP_PROXY|USE_CDN'

This is the output.
'FALLBACK_PROXY|HTTP_PROXY|USE_CDN'
CVMFS_EXTERNAL_FALLBACK_PROXY=
CVMFS_EXTERNAL_HTTP_PROXY=
CVMFS_FALLBACK_PROXY='http://cvmfsbproxy.cern.ch:3126;http://cvmfsbproxy.fnal.gov:3126'    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/cern.ch.conf
CVMFS_HTTP_PROXY='http://squid:3128;DIRECT'    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/cern.ch.conf
CVMFS_USE_CDN=yes    # from /etc/cvmfs/default.local


For reference, I followed this guide.
https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html
Followed by
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971
However, while this method has always been tried and true in the past, I got immediate failures when running 'cvmfs_config probe'.
So then it was on over to this thread https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594
I copied the default.local outlined in this thread and now things seem to be going fairly okay, I think.
For what it's worth, I feel like at this point I'm one of those monkeys trying to write Shakespeare, pounding keys and hoping for a miracle. I quite literally have no idea what is making things work behind the scenes or what my part in it is (CVMFS and native theory specifically).
Thank you for everyone who have taken the time to pile all these resources together. Now I just need to figure out how to optimize my setup, perhaps.
14) Message boards : ATLAS application : extraction failed: could not extract squashfs data, unsquashfs not found (Message 44386)
Posted 25 Feb 2021 by wolfman1360
Post:
Hello
Back after a long while and figured I'd start off small with just an old dual Opteron 6128.
Boinc 7.16.6, Ubuntu 20.04, virtual box 6.1.16, and to my knowledge cvmfs working (at least the probe worked just fine).
I am a complete newbie at this, so any help is appreciated.
The following task errored out. The machine is set to use 4 CPU cores per task, has plenty of ram (32 GB).
https://lhcathome.cern.ch/lhcathome/result.php?resultid=302706770
If any more info is needed please let me know.
Any help appreciated.

Edit: Well, used apt-get to install and now we're crunching just fine. I don't recall this happening before. Does this not get installed alongside anymore?
Making a checklist for this project so this is good to know.
For reference,
sudo apt-get install squashfs-tools
15) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41097)
Posted 28 Dec 2019 by wolfman1360
Post:
I'm not understanding your last point, but then I'm not a solid Linux user and a lot of things are still over my head - I'm assuming what you mean is the project was calling too much of the CPU into action and it couldn't keep up.
Is there a big difference in which tasks you run vs. credits? For instance does Atlas pay more than sixtrack? How are your Ryzen's configured? My machines are 16-32 gb ram with my Ryzen 7 having 64 gb total, so it should be able to handle 7 2-core Atlas's just fine with my other i7's having 1 or 2 core Atlas's running depending on 16 or 32 gb of ram and 75% CPU usage.

It's a quiet Christmas eve here, so I'm currently throwing together an old Xeon w3520 I found in the parts drawer. I think I have an i7 from the same time period here somewhere...will be two good space heaters in my office in the cold Canadian winter, I just need to find a cooler for the latter and see how my electric bill fairs. No cases on either of these - not enough room, ironically. They will of course both run Linux - my main desktop and laptop will continue running Windows, at least for now, while I slowly but surely navigate around the Linux command line and learn how to break things, then fix them. Old as these are every little bit will help is my motto.


Here is a short history of load averages in Linux. It has two things going for it: First, it is very well written, has a simple to understand summary on the first page, and provides concrete examples with code snippets later on. Second, it was the very first article that came up when I did a Google search for "Unix load average explained".

http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html


That was quite informative. Very interesting reading there.
thanks for that.
16) Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED (Message 41096)
Posted 27 Dec 2019 by wolfman1360
Post:
Thank you for catching that. Apparently trying to read things at 2 in the morning is hard.
I have installed Python 2.7, I have not set up a second Boinc instance however I have also done
sudo chmod -R 777 /var/lib/boinc-client
Hopefully everything is good from now on. I did not see that Python was a prerequisite for Atlas native either but I of course may have missed that.
It will be fantastic when this is no longer needed - hopefully that comes out of dev soon.
17) Message boards : ATLAS application : error on Atlas native: 195 (0x000000C3) EXIT_CHILD_FAILED (Message 41089)
Posted 27 Dec 2019 by wolfman1360
Post:
Hello,
I am getting this error on Atlas native tasks like https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862538 or https://lhcathome.cern.ch/lhcathome/result.php?resultid=256862442

I'm pretty sure I have everything installed correctly as tasks like https://lhcathome.cern.ch/lhcathome/result.php?resultid=256735752 finish successfully.

I'm at a loss of what to do here. All machines running Boinc 7.9.3.
Suspending Atlas on these machines for the time being.
A quick search of the forum yielded CMS was not installed (it is).
Any help appreciated here.
18) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41064)
Posted 24 Dec 2019 by wolfman1360
Post:
...I do have other machines that are in datacenters and internet isn't an issue. One day that 300/25 will come my way and I can be happy. And maybe a few more Ryzens too.
I'm aiming for theoretically using up around 75% of each machine if at all possible. Should I be aiming at more or less? Basically 2 threads not being used on an 8 threads machine and similar on 16 - though might bump that up to 3 not in use.


I started LHC@home with one Ryzen 3700X (my desktop) running at 25-75% load for 10 days straight and then added a second (my server) running at 75%-100% load. 15 days after that (25 days total) I had one million points for this project. So yes, running at less than 100% is very do-able especially if you have lots of threads, RAM, and need your systems for other tasks.

EDIT: I forgot to add that when running the server at 75% the "load average" (how many treads were asking my kernel for a CPU at once) was a little less than 13 on average. When I turned LHC up to 100% the load average would quickly spike into the low 20's and stay there. That's not good on a system that can only handle 16 threads.

I'm not understanding your last point, but then I'm not a solid Linux user and a lot of things are still over my head - I'm assuming what you mean is the project was calling too much of the CPU into action and it couldn't keep up.
Is there a big difference in which tasks you run vs. credits? For instance does Atlas pay more than sixtrack? How are your Ryzen's configured? My machines are 16-32 gb ram with my Ryzen 7 having 64 gb total, so it should be able to handle 7 2-core Atlas's just fine with my other i7's having 1 or 2 core Atlas's running depending on 16 or 32 gb of ram and 75% CPU usage.

It's a quiet Christmas eve here, so I'm currently throwing together an old Xeon w3520 I found in the parts drawer. I think I have an i7 from the same time period here somewhere...will be two good space heaters in my office in the cold Canadian winter, I just need to find a cooler for the latter and see how my electric bill fairs. No cases on either of these - not enough room, ironically. They will of course both run Linux - my main desktop and laptop will continue running Windows, at least for now, while I slowly but surely navigate around the Linux command line and learn how to break things, then fix them. Old as these are every little bit will help is my motto.
19) Message boards : Number crunching : Only getting Theory app tasks (Message 41057)
Posted 24 Dec 2019 by wolfman1360
Post:
I think it also depends on the Boinc scheduler. For instance, on one of my machines, when I attached, I was told that none of my selected projects (CMS, Theory, Atlas) had work available. The website however still showed tasks available and Not long after that they were crunching away. So I think it also comes down to patience.

Regardless, still have a lot of work to do with my app config. Never had to specify ram before so this ought to be fun.
20) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41045)
Posted 24 Dec 2019 by wolfman1360
Post:
I see you are on Ubuntu 18.04.3. I have not had much luck getting native ATLAS to work on recent installs (native Theory is OK).
Let me know what you come up with.

I'd answer that, but my 4 Linux machines haven't received any Atlas tasks. 3 out of 4 received Theory, the last one wasn't able to get anything but Sixtrack, and nothing was able to pull CMS. One of my Windows machines did get 2 Atlas tasks (ironically, I have 2 CPU cores set in preferences with unlimited jobs. I guess I have to make an app config for what I want specifically on each machine).
I still need to do some fiddling, I think Boinc is failing to recognize Intel VTX is enabled on one machine, but I also know that's in the checklist, so I'll get to that at some point.
For some reason I figured this would be a lot more involved, but once everything is set up correctly - and really all the setup involved were the few commands in Linux - everything went smoothly.
My next question. In regards to native, are there any tweeks folks recommend making to the default.local file or is it fine as is? I just grabbed the one referenced over at https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971
thanks


Next 20


©2021 CERN