21) Message boards : ATLAS application : LHC shuts down, but simulation continues! (Message 37502)
Posted 3 Dec 2018 by gyllic
Post:
Which means that many crunchers with machines with low RAM (and no possibility to upgrade) will NOT be able to crunch LHC projects (for example: what concerns myself - only 2 out of my 5 PCs have more than 4GB RAM, with the other three I cannot crunch ATLAS because of only 4GB and 3 GB RAM).
Machines with 4GB RAM can crunch ATLAS native tasks (but it has to be at least 4GB RAM, otherwise you won't get tasks).
22) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 37496)
Posted 3 Dec 2018 by gyllic
Post:
Looks like you are missing the python setuptools package.

To fix your problem, type:
sudo apt install python-setuptools
and execute the
cmake ../
command again. Then follow the shown procedure.
23) Message boards : Number crunching : Memory requirements for LHC applications (Message 37412)
Posted 23 Nov 2018 by gyllic
Post:
Thanks David!


Are you talking about Boinc tasks or VB jobs or what?
About VB jobs (that run inside the vbox/boinc tasks) and native ATLAS tasks (these are the same as the vbox jobs that run inside the vbox/boinc tasks). So the entire vbox/boinc task will need much more RAM than shown in the plots from David (because of the OS and all other stuff that needs to be virtualized/emulated).
24) Message boards : Number crunching : Memory requirements for LHC applications (Message 37400)
Posted 22 Nov 2018 by gyllic
Post:
Thanks for the info david!

Just out of interest, do you get these values you used for the plot from memory_monitor_out (or something like this) files? If not, how do you get those values?
How big are the differences in used/needed RAM depending on the task IDs (probably small because the vbox app uses a fixed value for all different task IDs)?
25) Message boards : Number crunching : Memory requirements for LHC applications (Message 37372)
Posted 18 Nov 2018 by gyllic
Post:
I get different numbers than bronco, but thats probably because bronco has not considered the data that has been swapped out in the 4x1-core tasks case, see below.

The used machine is a dedicated, headless machine that is only used for native ATLAS tasks.

The test procedure was very similar to bronco’s: Reboot the machine, check the “used memory” value from the “top” command, start a new native ATLAS task, check the “used memory” output after ~2 hours of runtime (wall clock time), wait until the task finished, reboot machine again, and so on. So the only value that is taken into account here is the “used memory” output from the “top” command. This was done for one 1-core, two 2-core, one 3-core and one 4-core tasks. All tasks were from the same task ID. But the memory requirements probably won’t change hugely with different task IDs?

Here are the numbers (I have rounded some of the values to get nicer numbers which lead to entire memory is not equal to free+avail+...):

After restart (~constant for all reboots):
KiB Mem :  6106000 total,  5870900 free,   110828 used,   124012 buff/cache
KiB Swap:  6280000 total,  6280000 free,             0 used.  5799740 avail Mem

With one 1-core task:
KiB Mem :  6106000 total,   714116 free,  2346892 used,  3044732 buff/cache
KiB Swap:  6280000 total,  6280000 free,        0 used.  3480912 avail Mem
==> ( 2346892 - 110828)/1024 ~ 2200MB RAM used by one 1-core native ATLAS task.

Bronco calculated 1250MB RAM for a 1-core task based on the 6100MB RAM that were used. But if we consider the swapped data the “RAM used” would get to ~ “(6100MB+3000MB)/4 ~ 2300MB” which is much closer to my values.

With two concurrently 2-core tasks:
KiB Mem :  6106000 total,   140884 free,  5037536 used,   927320 buff/cache
KiB Swap:  6280000 total,  6270268 free,     9732 used.   824744 avail Mem
==> ((5037536 – 110828)/2)/1024 ~ 2400MB RAM used by one 2-core native ATLAS task.

Relatively good agreement with bronco’s data.

With one 3-core task:
KiB Mem :  6106000 total,   329500 free,  2814716 used,  2961520 buff/cache
KiB Swap:  6280000 total,  6280000 free,        0 used.  3014728 avail Mem
==> ( 2814716 - 110828)/1024 ~ 2600MB RAM used by one 3-core native ATLAS task

With one 4-core task:
KiB Mem :  6106000 total,   337988 free,  3010464 used,  2757288 buff/cache
KiB Swap:  6280000 total,  6280000 free,        0 used.  2820824 avail Mem
==> ( 3010464 - 110828)/1024 ~ 2800MB RAM used by one 4-core native ATLAS task


Formula suggestion for native ATLAS tasks with some additional safety margin and considering the fact that the longer the tasks run the more memory they need (at least the “used memory” value rises with run time):

2100MB + 300MB*nCPUs
26) Message boards : Theory Application : [ERROR] No jobs were available to run. (Message 37367)
Posted 17 Nov 2018 by gyllic
Post:
Something is running very wrong over there, and obvioulsly they don't have the experts to get that fixed.

Erich,
please more respect for the Cern-IT and project-Teams!
+1

Critical and objective feedback is always good and welcome, but everyone here who is part of LHC@home should keep in mind before raging and complaining that this is a part of a huge research project, there will always be some things that won't work perfectly, break or something else. The entire infrastructure is not trivial, the admins here most probably also have other additional stuff to do, ...
27) Message boards : Number crunching : Memory requirements for LHC applications (Message 37351)
Posted 15 Nov 2018 by gyllic
Post:
@bronco: thanks for your work! Your mentioned formula probably goes in the right direction. Maybe I find some time in the next couple of days to do some tests as well (from 1-core to 4-core tasks) to get more data. Will take a couple of days though...

@BITLab Argo: Using singularity probably won't have big effects on the used/required memory (just a guess).

Maybe the file "memory_monitor_output.txt" within the PandaJob directory within the boinc's slot directory gives helpful informations to more advanced linux users.
28) Message boards : Number crunching : Memory requirements for LHC applications (Message 37327)
Posted 12 Nov 2018 by gyllic
Post:
@gyllic

300MB + (1024MB*nCores) <=> 1324 for the first thread, 1024 for each additional

Should it be that low? If nobody objects then let's make it 300MB + (1024MB*nCores)
I have not tested this formula, so I can't give you an answer. Unfortunately I don't have the time to look into this deeper/test this out at the moment. But if you want and if you have the time for it, you could test your mentioned formula since you are running native ATLAS tasks ;-). Try to variate the #cores/task and compare the value from your function with the actual amount of needed RAM.
29) Message boards : Number crunching : Memory requirements for LHC applications (Message 37319)
Posted 12 Nov 2018 by gyllic
Post:
ATLAS works differently compared to the other vbox apps like Theory, LHCb or CMS.
ATLAS tasks don't use HTCondor or something like that, so the job distribution is done by the boinc server.
Here you can see which task IDs are currently crunched by LHC@home ATLAS tasks https://lhcathome.cern.ch/lhcathome/img/progresschart.png. To see more details on that you can go to https://bigpanda.cern.ch/ (you maybe get an invalid/insecure SSL certification issue which can be solved easily).
No difference is made between low-spec and high-spec machines regarding which tasks are sent to the machines.
30) Message boards : Number crunching : Memory requirements for LHC applications (Message 37306)
Posted 11 Nov 2018 by gyllic
Post:
I think we're done. Can someone with the authority please take this last copy and post it as a pinned message?
I still think that the mentioned formula for native ATLAS tasks is wrong!

It's not necessarily ATLAS that will be swapped out.
This is the point that should be avoided (and it's different on every system).
True.

Since I still have no access to the 6GB RAM machine, the folloing values are from a 4-core 8GB RAM machine, which, according to the mentioned formula, should also not be able to run 4-core native ATLAS tasks.
The top command shows for a 4-core native ATLAS task:

top - 15:51:50 up  1:02,  3 users,  load average: 4,87, 5,74, 5,63
Tasks: 238 total,   5 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1,4 us,  0,5 sy, 98,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem :  8055880 total,  1722144 free,  4131688 used,  2202048 buff/cache
KiB Swap:  8263676 total,  8257064 free,     6612 used.  3396372 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                               
12793 boinc     39  19 2661932 1,846g  96724 R  99,2 24,0  38:24.48 athena.py                                                                                                                                                                                             
12795 boinc     39  19 2662484 1,841g  97640 R  98,3 24,0  38:27.32 athena.py                                                                                                                                                                                             
12794 boinc     39  19 2662208 1,842g  96476 R  98,1 24,0  38:31.26 athena.py                                                                                                                                                                                             
12792 boinc     39  19 2661656 1,832g  91272 R  95,7 23,8  38:24.15 athena.py                                                                                                                                                                                             
 1695 user      20   0 3187056  92364  59832 S   2,1  1,1   3:25.62 kwin_x11                                                                                                                                                                                              
 1709 user      20   0 4591228 205860  95756 S   1,8  2,6   1:56.66 plasmashell                                                                                                                                                                                           
  632 root      20   0  386968  96460  63920 S   1,5  1,2   4:44.54 Xorg                                                                                                                                                                                                  
 1697 root      20   0  441096   8108   6252 S   1,0  0,1   0:00.91 udisksd                                                                                                                                                                                               
 2200 user      20   0 2312260  81116  67628 S   0,4  1,0   0:22.24 boincmgr                                                                                                                                                                                              
10829 root      20   0 1004540 313696  11596 S   0,4  3,9   2:04.87 savscand                                                                                                                                                                                              
 2203 user      20   0  581716  62712  52692 S   0,3  0,8   0:11.70 konsole                                                                                                                                                                                               
18145 user      20   0   45080   3864   3068 R   0,2  0,0   0:00.06 top                                                                                                                                                                                                   
  753 boinc     30  10  285464  15788  12264 S   0,1  0,2   0:13.10 boinc                                                                                                                                                                                                 
  861 sophosav  20   0 1266080  15360  12892 S   0,1  0,2   0:02.71 mrouter                                                                                                                                                                                               
 1182 sophosav  20   0  810228  19516  16032 S   0,1  0,2   0:04.30 magent                                                                                                                                                                                                
 1711 user      20   0  503692  31044  27552 S   0,1  0,4   0:03.27 xembedsniproxy                                                                                                                                                                                        
 3278 boinc     30  10   13468   2960   2496 S   0,1  0,0   0:03.99 wrapper_26015_x
So basically no swap is used, free memory is about 1.7GB, available memory is over 3GB and cached memory is over 2GB. The high load average probably comes from the fact that I used the PC for other stuff as well while the simulation was runnning.
31) Message boards : Number crunching : Memory requirements for LHC applications (Message 37277)
Posted 8 Nov 2018 by gyllic
Post:
What could be helpful is that you also monitor the amount of free RAM, cache size and used swap.
Currently I have no access to this particular machine, so I cant tell you/monitor these values. But since the CPU usage (that is shown in the stderr_txt file) is for almost every task > 385% for a 4-core task (e.g. https://lhcathome.cern.ch/lhcathome/result.php?resultid=208242279), probably no swapping takes place.
32) Message boards : Number crunching : Memory requirements for LHC applications (Message 37266)
Posted 7 Nov 2018 by gyllic
Post:
It works because it uses virtual memory.
For a 4-core native ATLAS task, the "top" command shows 4 athena.py processes which need ~30% RAM each (for this particular machine), so a total of 120% RAM, which obviously can't be correct.
One of the advantages of the multicore is that RAM is shared between the athena.py processes. This might lead to the discrepancy between the actual needed amount of RAM and the shown one in the "top" command.

But still, if someone wants to know if his PC can crunch native ATLAS tasks, your mentioned formula is deceptive. If someone is new here and wants to check if he can run a, e.g., 4-core native ATLAS task with a 8GB RAM machine, he would conclude with your mentioned formula that it is not possible, although it is more than enough.
33) Message boards : Number crunching : Memory requirements for LHC applications (Message 37262)
Posted 7 Nov 2018 by gyllic
Post:
The formula is wrong. It should be 100 + 2000 * nCPU.
The formula 100 + 2000*nCPU is also wrong. I don't know the correct one but since my PC with 6GB RAM can crunch a 4 core native ATLAS task without any problem, the mentioned formula can't be correct (according to the formula the native task would need 100 + 2000*4 which is 8100MB on a 6000MB RAM machine which obviously would not work).
It is also possible to crunch two 2-core native ATLAS tasks concurrently with 6GB RAM, so the correct formula has to be another one.
34) Message boards : ATLAS application : ATLAS issues (Message 37136)
Posted 30 Oct 2018 by gyllic
Post:
Thanks for responding, but all your help and Google's and over 5 days of my own input have failed to get this bloody project running.
I'm sorry that getting this native ATLAS app up running is such a pain in the ass for you. Maybe it should be mentioned that the native app is still in beta, so things may be harder to setup compared with other applications.
It is a little bit weird because I have tested it and basically copy pasted all the commands from the previous post and from the guide, and it all worked well.

I found that in /var/cache/dnf there is a file called "expired_repos.json", and it seems the main entry that is keeps getting passed to this file is ["cvmfs"].
This seems to cause me to get an error in terminal saying "can't syncronize repos".
After I deleted that 'dnf' folder (it gets recreated) I was able to actually make and install CVMFS, Singularity and Golang.
If you added a dnf/yum repository earlier that does not work or you don't need it anymore, you can try to disable it with "sudo dnf config-manager --set-disabled repository" where "repository" is the name of the corresponding repository.

Testing Singularity --version, produces the version number so I thought all was now OK.
This sounds good, although (as we learned just a couple of days/weeks ago) getting just the output is not sufficient in order to determine if the ATLAS tasks will work. But as you already said, for now cvmfs is the reason why your tasks are not successfull.

Doing 'cvmfs_config setup' and 'chksetup' seemed to work (at least at first), but 'cvmfs_config probe' fails every time.
As long as the probing fails, all native ATLAS tasks will also fail. What is the output of the chksetup command?

Found that the label for /scratch/cvmfs/ is set wrong, but no matter what I do and try I can get the label to change (from 'default_t' to 'cvmfs_cache_t'), I keep getting the error that I have used an "Invalid argument".
I have checked this over and over and I am using the chcon command correctly but it keeps failing.
Not sure what you mean with that. If you followed the guide, then "/scratch/cvmfs/" will be the directory where cvmfs will place/search for the local cache. You have to create that folder manually first, and you can choose a different location if you want.
To simplify things for the moment, I recomend you to remove the lines "CVMFS_CACHE_BASE=/scratch/cvmfs" and "CVMFS_QUOTA_LIMIT=4096" from your "/etc/cvmfs/default.local" file. You probably will have to execute "sudo cvmfs_config setup" or "sudo cvmfs_config reload" again.

CVMFS does not seem to start and this is borne out but my failed work units, which say
"no cvmfs_config in /usr/local/bin"
"Can't find cvmfs_config does not exist"
"No CVMFS"
I'm not sure why it says that, maybe because the probing fails. If you type "cvmfs_config --help" in the terminal, do you get an output?

When I do 'cvmfs_config probe' the result is
"Probing /cvmfs/atlas.cern.ch... Failed"
"Probing /cvmfs/atlas-condb.cern.ch... Failed"
"Probing /cvmfs/grid.cern.ch... Failed"
Obviously this should not be the case. Have you tried to run the command "sudo service autofs restart" i.e. the fedora equivalent command and then tried to probe again?

When I look in the folder /cvmfs/ there is nothing in it.
Which directory do you mean? "/etc/cvmfs"?

So it looks like this volunteer just wont be helping ATLAS do anything. I can at least run Sixtrack I suppose.
You can still try to run the virtualbox based ATLAS app or the other virtualbox applications (Theory and LHCb at the moment). For that, yeti has written a very nice checklist which you will find on the message boards.

If you want you can send me a PM and I will look at your computer through teamviewer and see if we can get it up running (but i am no fedora expert).
35) Message boards : ATLAS application : ATLAS issues (Message 37100)
Posted 26 Oct 2018 by gyllic
Post:
Hi conan,

with the things you described, probably the easiest way to get cvmfs running for you is to build it from sources.
To do so on fedora 25 workstation, follow these steps:

1. login with your default user (that has sudo rights) and clean dnf:
sudo dnf clean all
2. Install all packages that are required to build cvmfs on fedora 25 as well as other packages like nano (I'm not sure if all following packages are needed, but to simplify things just install all of them)
sudo dnf install -y cmake make automake gcc gcc-c++ kernel-devel autofs fuse fuse-devel python-devel libcap-devel git attr valgrind-devel sqlite-devel libuuid-devel uuid-devel tar patch bzip2 zlib-devel openssl-devel unzip nano
3. continue with step 3 in the "build and install cvmfs" section from this guide https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840&postid=36880#36880, i.e.:
cd
mkdir cvmfs_source
git clone https://github.com/cvmfs/cvmfs.git cvmfs_source
...


Hopefully this solves your porblem. On my minimal installation of fedora 25 workstation cvmfs compiled and worked after the additional set up steps that are described in the guide.
36) Message boards : ATLAS application : ATLAS issues (Message 37096)
Posted 25 Oct 2018 by gyllic
Post:
I found the CVMFS Package and this mostly installed what was needed but still did not compile.
Which cvmfs package do you mean? If you already have cvmfs installed from a package, there is no need to compile it any more (you can try to type the command "cvmfs2 --version" into the terminal and see if you get any output). You can download rpm packages from here: https://cernvm.cern.ch/portal/filesystem/downloads. Since you are on Fedora 25, I don't know if the packages that are provided for Fedora 27 and 28 will work for you. You can also add the yum repositories and try to install from them.

This may help you:
https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html

"CMake Error at CMakeLists.txt:10 (project)
No CMAKE_CXX_COMPILER could be found

Tell CMAKE where to find the compiler by setting either the environment variable "CXX" or CMake cache entry
CMAKE_CXX_COMPILER to the full path to the compiler, or the compiler name if it is in the PATH"
Have you installed all packages needed to build cvmfs (i.e. cmake, gcc, etc.)? Since the guide is for Debian systems, the packages you have to install in order to compile cvmfs are maybe called differently on Fedora and maybe you have to install more than listed in the guide.
37) Message boards : ATLAS application : An issue with singularity (Message 37051)
Posted 16 Oct 2018 by gyllic
Post:
I have set up a virtual machine with Linux Mint 17.3 and singularity also did not work with the default kernel. I then updated the kernel from the default 3.19 to 4.4.0 generic, and now singularity with the "singularity --debug exec ..." command works. So it is propably something with the default kernel that singularity does not like.
My uname -a output is: Linux testing-VirtualBox 4.4.0-98-generic #121~14.04.1-Ubuntu SMP Wed Oct 11 11:54:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
38) Message boards : ATLAS application : An issue with singularity (Message 37044)
Posted 16 Oct 2018 by gyllic
Post:
Thanks for your information.

I am no expert on these kind of things, but my first guess is that your kernel is missing the overlay filesystem module or it is failing to load it automatically (since this is only a guess, your problem might be located somewhere completely different). It should be in the kernel since (I think) version 3.18, but maybe it is not in yours.

To check that, please run the command "singularity --debug exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname" and then run the command "lsmod". Look if the module called "overlay" is shown in the output of the lsmod command (and that it is used).

If you dont see a module called overlay in the lsmod output, look into the folder /lib/modules/*your kernel*/kernel/fs/ and search for a folder called overlayfs (at least this is the path on debian).
I have not tested the following, so this is just to give you an idea on how you may be able to fix your problem: If the overlayfs directory is present, you can try to manually load it with the "modprobe" command and the according module name, so "modprobe overlay" or something like that. You can also type the module name into the /etc/modules file and restart your PC. This way the module should be automatically loaded at boot. If no such folder exists, your kernel probably does not have the overlay module which is needed (according to the singularity debug output). You maybe can try to use a backport kernel or upgrade your Mint 17.x installation to a newer one (which you should do because Mint 17.x is only supported until April 2019).
39) Message boards : ATLAS application : An issue with singularity (Message 37041)
Posted 15 Oct 2018 by gyllic
Post:
Have you run the command "sudo make install" after compiling singularity?
Can you run singularity with your default user (i.e. with a different user than boinc)?
Which version of singularity do you use? Type "singularity --version" into your terminal and please post the output.
Which OS are you running? It shows Linux 3.19, current Debian is using 4.9.0 and olstable Debian is using 3.16 (the guide has only been tested on Debian stretch, but singularity obviously should work on other OS's as well). So please post the output of "uname -a" here as well.
40) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 36987)
Posted 9 Oct 2018 by gyllic
Post:
Can anyone confirm if ATLAS tasks work with this version?
I cloned the current git repository, checked out tag v3.0.0, compiled singularity and ran an native ATLAS task. It worked without any problems.

The output of "singularity --version" is: "singularity version v3.0.0"
The corresponding task that was crunched with this singularity version is: https://lhcathome.cern.ch/lhcathome/result.php?resultid=207507345

So yes, looks like it is working.


Previous 20 · Next 20


©2019 CERN