41) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 36987)
Posted 9 Oct 2018 by gyllic
Post:
Can anyone confirm if ATLAS tasks work with this version?
I cloned the current git repository, checked out tag v3.0.0, compiled singularity and ran an native ATLAS task. It worked without any problems.

The output of "singularity --version" is: "singularity version v3.0.0"
The corresponding task that was crunched with this singularity version is: https://lhcathome.cern.ch/lhcathome/result.php?resultid=207507345

So yes, looks like it is working.
42) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 36880)
Posted 26 Sep 2018 by gyllic
Post:
Hi Guys!

This is Version 2 of a short guide for building every program you need to run native ATLAS tasks on Debian 9 (Stretch) from sources. For Version 1 and all the corresponding discussions look here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4703. To get an “overview” of the native Linux ATLAS app, see here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4617.

Why Version 2 of this guide?

Since singularity (container software that is needed) has been completely rewritten in Go, the build process works differently compared to that one described in Version 1. Additionally, it is now shown how to build the BOINC manager (optional) from source.
Also, more comments and more detailed instructions, especially for the CVMFS setup, have been added as well as some additional informations.

Why use the native ATLAS app?

There are mainly two reason why you should consider running the native ATLAS app: higher efficiency and lower hardware requirements compared to the Virtual Box ATLAS app. This is achieved by a couple of measures/properties of the native ATLAS app:

• No start up and shutdown phase of the virtual box instance is needed.
• You don’t lose CPU power for virtualization or emulation of an OS (which you do when using the Virtual Box app) if you use the native app
• The native app needs much less RAM compared to the virtual box app.
• You can use a local CVMFS cache that is used for every new task (if you use the virtual box app this cache gets deleted once the calculation of one task has finished and has to be filled up again for the next task) which saves internet bandwidth and leads to a faster start of the actual calculations.
• You can configure CVMFS to use openhtc.io, which uses cloudflare’s CDN, and therefore benefits from all the advantages of a CDN (see the “build and install cvmfs” section on how to set it up)

What do you need to run the native ATLAS app?

- Linux (Debian is used for this guide, but others are possible) https://debian.org
- CVMFS (cern virtual machine file system) https://cernvm.cern.ch/portal/filesystem
- singularity (container) https://www.sylabs.io/
- BOINC https://boinc.berkeley.edu/

At the time this is written, Debian is in version 9.5, BOINC in version 7.13.0, CVMFS in version 2.6.0 and singularity in version 3.0.0-beta1.

Unfortunately it is not possible to run ONLY native ATLAS and other projects that use a virtualbox approach at the same time with the same boinc instance. The setup described here will work, but it is more or less random if you get native ATLAS tasks or virtual box ATLAS tasks. There are three ways to force to get only native ATLAS tasks:

• The most elegant way would be to install two boinc instances on the same machine. One instance will be used to crunch virtualbox tasks only, while the second boinc instance will crunch native ATLAS only (thanks computezrmle for the info). You probably will find some guides in the internet on how to setup two or more boinc instances on the same machine.
• Deinstall/remove virtualbox completely from your PC (of course, all virtualbox based tasks won’t function any more),
• or tell the BOINC client to ignore it (of course, all virtualbox based tasks won’t function any more). Look here for more informations: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4617&postid=34426.

You also have to accept to run beta apps in your LHC@home settings here https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project.

You can install most of the programs from repositories, but this guide shows you how to compile and set up everything from source. With compiling the programs from sources, you get the latest available code (and not the sometimes rather old packages from the repositories).

Build environment:

I have set up a new PC with a Debian 9.5 net installation and have installed Debian Stretch with the GNOME desktop environment. To run native ATLAS you don’t need a desktop environment, so a “server configuration” should be sufficient and this guide should work exactly the same. However, it was only tested on the mentioned configuration. If you run in “server mode”, you only have to compile the boinc client (more details in the “Compile boinc” section). If you don’t use a desktop environment building the boinc manger will not work (at least not that way how it is described here) and would not make a lot of sense.
Additionally, I have installed sudo and added my user to the sudo group. You need to have root rights on your PC.

The same process may also work on other Linux OS's (especially on Debian derivates like Ubuntu), but it was only tested on Debian 9 (Stretch).

Let's start to compile and install boinc from sources. Open a terminal and type the following:

1. update packages and install all new required packages:
sudo apt update
To build only the boinc client install these packages:
sudo apt install git build-essential pkg-config libsqlite3-dev libssl-dev libcurl4-openssl-dev m4 dh-autoreconf zlib1g-dev
If you want to build the boinc client and the boinc manager, install these packages (a desktop environment has already to be installed in order to be able to build the boinc manager):
sudo apt install git build-essential pkg-config libsqlite3-dev libssl-dev libcurl4-openssl-dev m4 dh-autoreconf zlib1g-dev freeglut3-dev libwxgtk3.0-dev libwxgtk-webview3.0-dev libjpeg-dev libnotify-dev libxmu-dev libxi-dev libgtk2.0-dev

2. Add user boinc (no root rights required, and I chose no password):
sudo adduser boinc

3. Change to user boinc, make a new directory and clone the git repository into it:
su
su boinc
cd
mkdir boinc_source
git clone https://github.com/BOINC/boinc.git boinc_source
cd boinc_source
If you want to compile the newest available code, continue with point 4. If you want to compile a specific version, e.g. 7.12.0, do the following:
To get an overview of all tagged code versions, type
git tag
To checkout version 7.12.0 for example, type
git checkout tags/client_release/7.12/7.12.0

4. To compile and install boinc, type
./_autosetup
To compile only the client, use
./configure --disable-server --disable-manager --enable-optimize
To compile client and manager type (building the boinc manager will probably only work if you have a desktop environment installed)
./configure --disable-server --enable-optimize
Finally, compile everything (will take a while) and install it
make
su
make install

5. To generate needed files, start and stop the client:
/usr/local/etc/init.d/boinc-client start
/usr/local/etc/init.d/boinc-client stop
exit
cd

6. To be able to manage this boinc client with the boinc manager from the same machine or from another PC, you have to edit the "gui_rpc_auth.cfg" file which is located in the boinc (user) home directory. A random password is already written in this file. Write a new password of your choice in the file if you want, save and close it. I have chosen no password. On a multi-user computer, this file should be protected against access by other users. Additionally, the file "remote_hosts.cfg" has to be created in the boinc (user) home directory. Write all IPs in this file from which you want to access this boinc client, e.g.:
127.0.0.1
192.168.0.2
192.168.0.3
The IP 127.0.0.1 is needed if you want to control the client with the compiled boinc manager if the manager and the client are on the same machine.

7. Start the boinc client again (as root). Now you should be able to connect to your client with boinc manager, which can be started by typing boincmgr into the terminal (if your boinc manager can’t connect with the desired client, try to reboot the PC where the client you want to connect to is installed). If the boinc client does not start after a PC restart automatically, use the above commands to start it.
Next, add LHC@home and adjust the settings as needed (you maybe have to increase the allowed memory, disk space, ...). Boinc should now be good to go.

Build and install cvmfs:

1. Change to a different (e.g. your) user, e.g. testing
su testing

2. Install all required packages:
sudo apt install cmake uuid-dev libfuse-dev python-dev libcap-dev attr autofs
These packages have to be installed. All other libraries etc. that are necessary to build CVMFS will be build automatically from the externals directory within the CVMFS sources. To get an overview of all of these needed packages, you can look here: https://github.com/cvmfs/cvmfs/tree/devel/externals. Probably another way would be to install these libraries etc. from Debian’s repositories (most of them should be available). Here, we use the provided sources within the CVMFS source code.

3. Make a directory, clone the git repository and create the needed directory structure (again, if you don’t want to use the latest available code and build a specific version, use the git tag and git checkout commands as shown in the boinc section):
cd
mkdir cvmfs_source
git clone https://github.com/cvmfs/cvmfs.git cvmfs_source
cd cvmfs_source
mkdir -p build
cd build

4. Build (this will take a while) and install cvmfs:
cmake ../
make
sudo make install

5. Make a directory that will be used as CVMFS cache. Here in this example the cache location will be /scratch/cvmfs (but you can use other locations as well):
sudo mkdir -p /scratch/cvmfs

6. Now we have to configure CVMFS. To do so, add the file default.local into /etc/cvmfs/
sudo nano /etc/cvmfs/default.local
and add the following to it:
CVMFS_REPOSITORIES=atlas.cern.ch,atlas-condb.cern.ch,grid.cern.ch
CVMFS_CACHE_BASE=/scratch/cvmfs
CVMFS_QUOTA_LIMIT=4096
CVMFS_HTTP_PROXY=DIRECT
The CVMFS_REPOSITORIES variable defines all repositories CVMFS will use. The shown above are needed in order to run the native ATLAS app.
The CVMFS_CACHE_BASE variable defines where CVMFS will place/search your local cache directory. If you have chosen a different location than the one shown in step 5, you will need to write your location here instead.
The CVMFS_QUOTA_LIMIT variable defines the size of the local cache directory in Megabytes (in this example 4GB will be used). It is important to know that this is a soft quota. Using a cache size of 4GB will result in CVMFS hitrates of about 99%.
If you have to (or want to) use a proxy server for CVMFS, you should change the CVMFS_HTTP_PROXY variable (e.g. CVMFS_HTTP_PROXY="http://192.168.1.15:3128” if your proxy server is at 192.168.1.15 and listen on port 3128). If you don’t want to use a proxy, use the DIRECT input as shown in the example above.

7. Now set up CVMFS for the first time:
sudo cvmfs_config setup

8. Test if CVMFS is working (you should get all "Ok").
cvmfs_config probe
If it fails, try "sudo service autofs restart" and try to probe again.

9. If you want to use openhtc.io for your CVMFS setup, follow this excellent guide from computezrmle https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4758.

Build and install singularity:

This section is completely different compared to version 1 of this guide since singularity has been completely rewritten in Go. At the time this is written, singularity 3 is in version 3.0.0 Beta 1. Since it is still in beta, it is possible that the build process will change in near future (although rather unlikely). To get the latest process, look here https://github.com/sylabs/singularity/blob/master/INSTALL.md. The following is basically the same as it is written in the linked file.

1. First, install all required packages:
sudo apt-get install build-essential libssl-dev uuid-dev libgpgme11-dev

2. To build version 3.x, follow this guide. If you (whyever) want to build singularity 2.x, use the same git tag and git checkout commands as shown above and go to version 1 of this guide where the process is described.
To build version 3.x do the following (since golang can be a little bit finicky to setup, it is probably a good idea to follow exactly this guide):

3. Install golang
Go to the golang homepage https://golang.org/dl/ and choose the version you want and download it. Here we use version 1.11 for linux 64 bit.
cd
wget https://dl.google.com/go/go1.11.linux-amd64.tar.gz
Extract the archive to /usr/local
sudo tar -C /usr/local -xzf go1.11.linux-amd64.tar.gz
Set up the environment for go (this will adapt your .bashrc file (add go bin paths), and the GOPATH variable will point to a directory called “go” which will be located within your home directory). The following shown commands will only work if you extracted the go tar file into /usr/local.:
echo 'export GOPATH=${HOME}/go' >> ~/.bashrc
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc
source ~/.bashrc
Now clone the singularity repository
mkdir -p $GOPATH/src/github.com/sylabs
cd $GOPATH/src/github.com/sylabs
git clone https://github.com/sylabs/singularity.git
cd singularity
and finally install the golang dependencies:
go get -u -v github.com/golang/dep/cmd/dep

4. Finally compile and install singularity (needed dependencies will be downloaded automatically):
cd $GOPATH/src/github.com/sylabs/singularity
./mconfig
cd ./builddir
make
sudo make install

3. Test if it works (you should get an output):
singularity --version

If everything worked well, you should now have everything you need to run native ATLAS tasks on Debian 9.

Again, feel free to report mistakes, improvements, ask questions, make any suggestions, etc. It would be much appreciated!

Gyllic
43) Message boards : ATLAS application : Atlas runs very slowly after 94% (Message 36783)
Posted 20 Sep 2018 by gyllic
Post:
If you are referecing to this task https://lhcathome.cern.ch/lhcathome/result.php?resultid=206669059 you can see that the Error message is "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED". So maybe you have very little free storage left on your hard drive or maybe your configuration is wrong.

In the logs you can also see that the VM state changes very often which indicates that something is wrong, e.g.:
2018-09-14 06:12:15 (5480): VM state change detected. (old = 'running', new = 'paused')
2018-09-14 06:18:23 (5480): VM state change detected. (old = 'paused', new = 'running')

A good idea would be to reduce the maximal cores per task from 8 to 1 and try to get it up running with 1 core and to increase the allowed disk space for BOINC. Once you have crunched a couple of tasks successfully with one core, you can increase the number of cores per task.
Also very helpful is Yeti's checklist here (see point 6) https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161
44) Message boards : Theory Application : New version 263.80 (Message 36782)
Posted 20 Sep 2018 by gyllic
Post:
total runtime: 14:51 hrs; total processor time: 6:01 hrs.
Maybe a dead Sherpa job?
45) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) (Message 36703)
Posted 13 Sep 2018 by gyllic
Post:
Since singularity 3 is a complete rewrite it could be that the command line options have changed. In addition it's still under active development so probably not guaranteed that the latest master will work.

So for now I would recommend that people only use version 2.x. Once version 3 is more stable we can look into whether any changes are necessary on our side.
Today I tested the latest github master branch version of singularity (with some of the current pull requests added manually) and now the native ATLAS tasks are running without any problem (tested on the dev-project site) with singularity 3.0.0 (alpha). See here: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2394056
So probably there won't be any changes needed in the ATLAS setup (at least to the current dev-project version) in order to run the native app with singularity 3.

Once singularity 3.0 is in beta, I will post an updated version of this guide in order to consider the changes since the original post.
46) Message boards : Sixtrack Application : Tasks available / tasks not available (Message 36645)
Posted 6 Sep 2018 by gyllic
Post:
Sixtrack tasks come in "waves". At the moment there are no available, but sooner or later there will be another patch of sixtrack tasks again.
CMS does have some problems right now, so no tasks are available. But according to this post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4688&postid=36457 there should be available CMS tasks soon.

There are still Theory and LHCb tasks you can crunch if the ATLAS tasks dont work for you.

Check the server status page where you can see which tasks are available: https://lhcathome.cern.ch/lhcathome/server_status.php
47) Message boards : ATLAS application : Download failures (Message 36537)
Posted 23 Aug 2018 by gyllic
Post:
same here at about same time:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=205447811
https://lhcathome.cern.ch/lhcathome/result.php?resultid=205453633
48) Message boards : ATLAS application : Linux Disk image for Boinc+Virtualbox that successfully runs Atlas tasks (Message 36496)
Posted 19 Aug 2018 by gyllic
Post:
If you could post the boinc log messages here, it would be much easier to figure out the problem. It sounds like boinc is not recognizing that virtualbox is installed.

Here two useful posts for getting the correct setup or to troubleshoot:

Yeti's checklist for VBox setups (make sure every single point is fullfilled):
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359

If you have a Linux OS installed on your boinc PCs, you may consider running the native atlas app, which needs much less resources (RAM) and is more efficient, this post gives you a good overview:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4617&postid=34410#34410.
49) Message boards : News : CMS production pause (Message 36467)
Posted 16 Aug 2018 by gyllic
Post:
OK, I'm pleased to announce that we have overcome some of the problems we were having (and some more in the meantime due to CMS deciding that all its production should run in singularity containers -- in our case inside the usual virtual machines).
Great news that some of the problems have been solved.

Although ATLAS and CMS are using different approaches, are you thinking of releasing a "native" app for linux hosts similar to ATLAS since you are using singularity now?
50) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) (Message 36456)
Posted 16 Aug 2018 by gyllic
Post:
If you confirm this works we can ask gyllic to update his instructions.
Is there a way to edit a few months old post? Othwise i will make a new post called "... Version 2".

I have built the singularity binaries from the current github master branch (which is the development branch) and tried to run the native ATLAS task with it. Unfortunately it did not work, here is the link: https://lhcathome.cern.ch/lhcathome/result.php?resultid=205038477. It says that singularity is not working.

First, the output of
singularity --version
is
singularity version 3.0.0-master.c033e898


To test if singularity is working I tried to execute the same command as the ATLAS_run_atlas_2.54 script, i.e.:
singularity exec -B /cvmfs %s hostname"%sin_image
which is "translated":
singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname

The output is:
FATAL   [U=1000,P=20383]   SContainer()                  exec /.singularity.d/actions/exec failed: no such file or directory

To test if my singulariy setup is working i have build a new image with the command:
singularity build hello.img shub://vsoch/singularity-hello-world:latest

and tested it with:
singularity exec hello.img cat /etc/lsb-release

which returns:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"

while I am on a debian machine. This indicates, at least in my opinion, that the singularity setup is working.

But I am still not sure why the ATLAS tasks are not working. Is there a problem with my singularity setup, is the singularity image used for ATLAS only compatible with singularity version < 3.x, is it a code problem within singularity, or something completely different?
Maybe a more advanced user with more knowledge can give an answer?
51) Message boards : ATLAS application : Request for new Default RAM Setting (Message 36080)
Posted 27 Jul 2018 by gyllic
Post:
...The legend for the lower graph on that page is 9 rows X 5 columns = 45 "nodes" each of which might have any number of CPUs behind it. BOINC is just 1 of 45. ... it is a relatively small percentage of the whole and BOINC is just a portion of that percentage.
The computational needs for CERN/LHC are giagantic, i.e. every single core helps, even if its portion is very small. Since CERN had not enough funds for an enormous computing center, CERN came up with the idea of the Worldwide LHC Computing Grid (WLCG). It is the most sophisticated data-taking & analysis system ever built for science, providing near real-time access to LHC data and it is also the largest computing grid on earth with over 800k cores and 170 computing centres. So yes, the portion of BOINC is rather small.

Taken from the homepage:
"Data pours out of the LHC detectors at a blistering rate. Even after filtering out 99% of it, in 2018 we're expecting to gather around 50 petabytes of data. That's 50 million gigabytes, the equivalent to nearly 15 million high-definition (HD) movies. The scale and complexity of data from the LHC is unprecedented. This data needs to be stored, easily retrieved and analysed by physicists all over the world. This requires massive storage facilities, global networking, immense computing power, and, of course, funding. CERN does not have the computing or financial resources to crunch all of the data on site, so in 2002 it turned to grid computing to share the burden with computer centres around the world. The result, the Worldwide LHC Computing Grid (WLCG), is a distributed computing infrastructure arranged in tiers – giving a community of over 10,000 physicists near real-time access to LHC data. The WLCG builds on the ideas of grid technology initially proposed in 1999 by Ian Foster and Carl Kesselman (link is external). CERN currently provides around 20% of the global computing resources."

Considering all these aspects (and plenty more which are not mentioned) you can think that the entire setup is extremely complex. Since this BOINC project is part of the WLCG, you can imagine that setting it up correctly is a challanging task (btw: as far as i know, the tasks running on this BOINC project are simulation tasks and no data analysing tasks etc.).

So comparing this BOINC project with other projects (in terms of 99% of them manage to do this and that, etc.) may be not a good idea and also wont help fixing the problems.

... open the config file in a text editor and tweak the RAM calculation up a little?...
I dont know how long it takes to adjust the RAM setting, but yes, they should solve the RAM problems.
52) Message boards : Theory Application : New Version 263.70 (Message 35964)
Posted 20 Jul 2018 by gyllic
Post:
I had the same problem with the previous version (before mulit core). I dont know exact numbers, but maybe 1 out of 6 or so did not correctly setup the proxy.

Do you guys know why my Theroy VMs do NOT shut down correctly? This happens to every single task, e.g. this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=200422861
The log says:
2018-07-20 12:51:44 (4364): Guest Log: [INFO] Job finished in slot1 with 0.
2018-07-20 13:02:07 (4364): Guest Log: [INFO] Condor exited with return value N/A.
2018-07-20 13:02:07 (4364): Guest Log: [INFO] Shutting Down.
2018-07-20 13:02:07 (4364): VM Completion File Detected.
2018-07-20 13:02:07 (4364): VM Completion Message: Condor exited with return value N/A..
2018-07-20 13:02:07 (4364): Powering off VM.
2018-07-20 13:07:11 (4364): VM did not power off when requested.
2018-07-20 13:07:11 (4364): VM was successfully terminated.
2018-07-20 13:07:11 (4364): Deregistering VM. (boinc_95d70c61e78dce9e, slot#0)
2018-07-20 13:07:11 (4364): Removing network bandwidth throttle group from VM.
2018-07-20 13:07:11 (4364): Removing VM from VirtualBox.
13:07:17 (4364): called boinc_finish(0)
53) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35961)
Posted 20 Jul 2018 by gyllic
Post:
If the already mentioned ideas don't help, the error may be caused due to a "wrong server configuration". If you want to run 1-core or 2-core ATLAS Vbox tasks, the server sends you a too low RAM setting for those tasks.
To change that, you have to use an app_config.xml. See this post: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=35921
54) Message boards : ATLAS application : This is not SLC6, need to run with Singularity (Message 35946)
Posted 17 Jul 2018 by gyllic
Post:
Since every event takes a different time to calculate, one core will have finished its 50 events faster than another core. So all cores have to waite untill the last one has finished its 50 events. This "waiting time" is bad for efficiency.
I think computezrmle described it in more detail in another post, but i cant find i right now.
Maybe I have to correct myself a little bit (im sure a more advanced user can give better infos): I think that the 200 events are not actually split up into 50 events per core, i.e. if one core (respectively the calculated events on that core) is faster than others, it might calculate more than 50 events and therefore another core calculates less than 50. But in principle, and the fact that all cores have to wait until the last core has finished its last event should be still true.

I recall that post and it was indeed quite detailed. Though it was informative and certainly added to my understanding of the situation, the above quote ties it all together for me. I've had similar concerns for the multi-core Theory tasks where it appears that if the algorithm for determining if there is enough time remaining for another job says "not enough time" and there is a looping Sherpa job or maybe a Pythia that is taking an abnormally long time to complete, then you have cores sitting idle until the task bumps up against the 18 hour limit.

Meh, I'm not convinced these multi-core apps are a good idea.
Yes, if you have enough RAM and want the best efficiency, you should run only 1-core tasks (true for all LHC apps as far as i know). But for PCs with low RAM, multicore apps can help to get more CPU cores that crunch for LHC@Home (although the efficiency is not so good but still the absolute work done is still higher).
55) Message boards : ATLAS application : This is not SLC6, need to run with Singularity (Message 35938)
Posted 16 Jul 2018 by gyllic
Post:
@bronco:
Have you changed the number of cores in the LHC@Home settings on the homepage? I think they introduced a limit of maximal concurrent tasks per host which depends on the "Max # CPUs" setting in order to get a more accurate number of actual used cores for LHC@home. I.e. if you have set "Max # CPUs = 1", the server will only allow you to have one task in progress. So if you already had a task (or more) with the "in progress" status, you wont be able to download any more.

@AuxRx:
As said, if you define the efficiency as efficiency = (CPU time)/(task runtime * number of cores), the lower the number of cores per ATLAS task, the higher the efficiency for that task will be. Here an example (both tasks were crunched on the exact same machine and are from the same task ID):

- 1-core task: CPU time = 39,091.66s, Run Time = 39,206.37s, number of cores = 1
==> efficiency = 99.7%

- 4-core task: CPU time = 47,125.95s, Run Time = 12,424.23s, number of cores = 4
==> efficiency = 94.8%

The reason behind it is that, as you said, the actual computation starts faster for the 1-core task compared to the 4-core task, and in the design of the multicore app: As far as i know, the current tasks calculate 200 events. If you use a 4-core task, these 200 events are split up to 50 events per core. Since every event takes a different time to calculate, one core will have finished its 50 events faster than another core. So all cores have to waite untill the last one has finished its 50 events. This "waiting time" is bad for efficiency.
I think computezrmle described it in more detail in another post, but i cant find i right now.
56) Message boards : Number crunching : Checklist Version 3 for Atlas@Home (and other VM-based Projects) on your PC (Message 35920)
Posted 15 Jul 2018 by gyllic
Post:
A question for admins are the workunits I am returning and getting marked valid really valid then as per previous reply.
Obviously I am not an admin, but yes, they are invalid in terms of scientific results. I.e. if you keep your current setup running as it is, it is a total waste of resources. A good indication if the task produced scientifc results is the existencee of the HITS.xxx file. This is one of your tasks with no HITS file https://lhcathome.cern.ch/lhcathome/result.php?resultid=200156318 and here a good task from another user https://lhcathome.cern.ch/lhcathome/result.php?resultid=200147295.

In this post https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178#29560, which was written by one of the admins, it says:

Therefore a truly successful WU must have a valid HITS file produced, however you can still get credit even if no HITS file is present because we don't want people to suffer from problems in ATLAS software or infrastructure.
You should, as computezrmle said, ajdust your RAM settings.
57) Message boards : ATLAS application : ATLAS native - Configure CVMFS to work with openhtc.io (Message 35915)
Posted 15 Jul 2018 by gyllic
Post:
So it appears that my HITRATE is over 99% if I am reading this correctly.
Yes, you are reading this correctly. This hitrate shows that caching some data for ATLAS is a good idea and that using a local cache will improve efficiency. This is also one of the big benefits of the native app compared to the vbox app because the local cache of the vbox app gets deleted as soon as the VM has shut down and been removed.

EDIT: looks like computezrmle was faster so ignore this post :-)
58) Message boards : ATLAS application : This is not SLC6, need to run with Singularity (Message 35909)
Posted 15 Jul 2018 by gyllic
Post:
... "This is not SLC6, need to run with Singularity...." in the stderr output files. So I assume if I were using SLC6 (Scientific Linux 6) singularity would not be needed.
SLC6 stands for Scientific Linux Cern 6. You can download it here: https://linux.web.cern.ch/linux/scientific6/ (End of support is before December 2020, so you might consider using CC7 (Cern CentOS 7), but i dont know if you can run the native ATLAS app without Singularity on CC7)
Yes, if you use SLC6 or CentOS 6 you can run the ATLAS native app without Singularity, see this task for example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=200098871
The output says:
OS:Scientific Linux release 6.9 (Carbon)

This is SLC or CentOS release 6, run the atlas job without Singularity


Would SLC6 increase efficiency and get more ATLAS work done? I don't really need to use Ubuntu, SLC6 would likely suit my needs.
I dont think that there would be a big difference in efficiency (if you compare singularity runs with non-singularity runs), but yes, theoretically, i think it should be more efficient when you dont use singularity (but i dont have actual numbers on that).

By using the following setup the effect on increasing efficiency is propably much higher compared to running without singularity and much less work to set up (although this is not the exact answer to your question :-) ):
One of your computers is configured to use 4 cores for ATLAS native. Generally speaking, the lower the CPU cores per ATLAS task, the higher the efficiency of that task (if you define it as (CPU time)/(task runtime * number of cores), i.e. it is more efficient to run concurrently 4 1-core tasks than 1 4-core task. So to increase efficiency, the first thing i would do is to reduce the number of cores per ATLAS task from 4 to 1 IF you have enough RAM for that (also running 2 2-core tasks would be more efficient than your current setup and propably the better choice considering your 8GB RAM limitation).
59) Message boards : Theory Application : New Version 263.70 (Message 35883)
Posted 13 Jul 2018 by gyllic
Post:
Same here, openhtc.io and local squid are used now.
Thanks Laurence!

Some questions and a suggestion:
- Is it correct that if the VM is configured to use 4 cores, it will run concurrently 4 separate Condor Jobs?
- What is displayed on the ATL+F2 screen? Is it a randomly chosen output from one of the currently running Condor Jobs?
- Would it be possible to associate one ATL+Fx screen to one particular core (or slot directory) to display the output of the Condor Job that is currently running on that core (i.e. e.g. screen ALT+F2 displays the output of the Condor Job running on Core 1, ALT+F3 displays the output of the Condor Job running on Core 2, and so on)?
60) Message boards : ATLAS application : ATLAS native - Configure CVMFS to work with openhtc.io (Message 35881)
Posted 12 Jul 2018 by gyllic
Post:
nice, thanks for the infos and tipps, computezrmle!

looks like it is working here.


Previous 20 · Next 20


©2024 CERN