1) Message boards : Number crunching : Recommended CVMFS Configuration for native Apps (Message 41886)
Posted 11 Mar 2020 by gyllic
Post:
/etc/cvmfs/config.d/atlas-nightlies.cern.ch.local
CVMFS_SERVER_URL="http://s1cern-cvmfs.openhtc.io/cvmfs/@fqrn@;http://s1bnl-cvmfs.openhtc.io/cvmfs/@fqrn@"


What is this needed for? Sorry if this has been explained before; was absent for some time.
2) Message boards : ATLAS application : ATLAS native version 2.72 (Message 40108)
Posted 10 Oct 2019 by gyllic
Post:
Your singularity version (2.6.1) is very old. Maybe the new image needs a more current version in order to work. You should update your singularity version.

For me, native version 2.72 looks like it is working without a problem:

2019-10-09 20:09:23,818: singularity image is /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img
2019-10-09 20:09:23,819: sys.argv = ['run_atlas', '--nthreads', '2']
2019-10-09 20:09:23,820: THREADS=2
2019-10-09 20:09:23,821: Checking for CVMFS
2019-10-09 20:09:39,404: CVMFS is installed
2019-10-09 20:09:39,404: Checking Singularity...
2019-10-09 20:09:40,399: Singularity is installed, version singularity version 3.4.1+324-g54b182afd
2019-10-09 20:09:40,399: Testing the function of Singularity...
2019-10-09 20:09:40,399: Checking singularity with cmd:singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img hostname
2019-10-09 20:10:03,571: Singularity Works...
2019-10-09 20:10:03,572: copy /home/boinc/boinc1/slots/0/shared/ATLAS.root_0
2019-10-09 20:10:03,872: copy /home/boinc/boinc1/slots/0/shared/RTE.tar.gz
2019-10-09 20:10:03,873: copy /home/boinc/boinc1/slots/0/shared/input.tar.gz
2019-10-09 20:10:03,873: copy /home/boinc/boinc1/slots/0/shared/start_atlas.sh
2019-10-09 20:10:03,873: export ATHENA_PROC_NUMBER=2;
2019-10-09 20:10:04,243: start atlas job with PandaID=4503174146
2019-10-09 20:10:04,243: cmd = singularity exec --pwd /home/boinc/boinc1/slots/0 -B /cvmfs,/home /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh > runtime_log 2> runtime_log.err

The first 2.72 task is running for over 10 hours now and everything looks fine.
3) Message boards : Theory Application : Out of BOINC-workunits for Theory Native (Message 40061)
Posted 1 Oct 2019 by gyllic
Post:
01.10.2019 21:10:46 | LHC@home | Sending scheduler request: To fetch work.
01.10.2019 21:10:46 | LHC@home | Requesting new tasks for CPU
01.10.2019 21:10:48 | LHC@home | Scheduler request completed: got 0 new tasks
01.10.2019 21:10:48 | LHC@home | No tasks sent
01.10.2019 21:10:48 | LHC@home | No tasks are available for Theory Native
4) Message boards : Theory Application : Simple Bash script which sets everything up automatically to run native apps (Message 39899)
Posted 10 Sep 2019 by gyllic
Post:
Hi guys,

I am working on a simple bash script which automatically sets everything up on various distributions (only ubuntu is implemented at the moment) to run the native applications. The script can be found at https://github.com/g84ycm/LHCHome_Native/. It is basically a collection of some forum posts, scripts and info taken from documentation. You need to have root privileges on your system in order to successfully run the script. The goal of this script is to simplify the setup process for new users in order to run the native apps.

To run the script, download it with
 wget https://raw.githubusercontent.com/g84ycm/LHCHome_Native/master/setup_native_LHC.sh 
and make it executable with
 chmod +x setup_native_LHC.sh 

The syntax to use the script is:
./setup_native_LHC.sh [distribution] [install boinc yes|no] 
More parameters will be added in later versions.
Type
./setup_native_LHC.sh help 
to get more info.

What the script does:

    - optionally installs boinc and boinc manager from repositories
    - installs CVMFS from cern repositories
    - sets up CVMFS to use a local cache and no local squid proxy
    - sets up CVMFS to use openhtc.io
    - checks if CVMFS is working correctly
    - sets up user namespaces (on ubuntu nothing needs to be changed)
    - checks if user namespaces are working
    - sets up cgroup stuff for suspend/resume to work (for native theory)
    - creates systemd script for suspend/resume to work


To Do:

    - implement everything from above for different distributions (openSUSE, CentOS, ...). Some needed distinctions are already within the code.
    - add option to build everything from sources



For the moment, only native theory will work. However, at the -dev project native ATLAS is running without a local singularity installation, so in future this script should set everything up to run native theory and ATLAS.

This is my first "larger" bash script, so I am sure that there are better and/or more elegant ways to implemet the different steps. Any hints for improvements, bug reports, code suggestions or pull requests would be much appreciated.

Right now the script has only been tested on ubuntu 18.04.3 with native theory.

gyllic

5) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 39480)
Posted 1 Aug 2019 by gyllic
Post:
OK, Federica managed to find the server which needed to be rebooted and jobs are starting to flow again. Thanks for your patience.
Looking good so far!
6) Questions and Answers : Getting started : No new tasks for LHC@home on Android? (Message 39447)
Posted 27 Jul 2019 by gyllic
Post:
look here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5093
7) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 39426)
Posted 23 Jul 2019 by gyllic
Post:
Looks like not only my hosts are affected.
Indeed, same here.
8) Message boards : CMS Application : CMS jobs are becoming available again (Message 39083)
Posted 8 Jun 2019 by gyllic
Post:
The new version of the new CMS boostrap fixed the issue mentioned here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4964&postid=39036. Thanks everybody.
9) Message boards : CMS Application : Local proxy can now be suggestet for use with CMS (Message 39082)
Posted 8 Jun 2019 by gyllic
Post:
The new CMS configuration is working well!

6 CMS tasks connected to cms-frontier.openhtc.io about 160.000 times in total, while the local proxy answered nearly every single request (hitrate > 99%). Additionally, the proxylogs show almost 10.000 connections to cvmfs-stratum-one.cern.ch, where the local proxy answered more than 98%.

Things are looking good, and using a local proxy for CMS tasks seems to be a very good idea.
10) Message boards : CMS Application : CMS jobs are becoming available again (Message 39036)
Posted 4 Jun 2019 by gyllic
Post:
One of my hosts shows messages like this within the VM (e.g. /logs/finished_7.log):
Setting up Frontier log level
Beginning CMSSW wrapper script
 slc6_amd64_gcc700 scramv1 CMSSW
Performing SCRAM setup...
Completed SCRAM setup
Retrieving SCRAM project...
Completed SCRAM project
Executing CMSSW
cmsRun  -j FrameworkJobReport.xml PSet.py
----- Begin Fatal Exception 04-Jun-2019 09:23:03 UTC-----------------------
An exception of category 'Incomplete configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing ESSource: class=PoolDBESSource label='GlobalTag'
Exception Message:
Valid site-local-config not found at /cvmfs/cms.cern.ch/SITECONF/local/JobConfig/site-local-config.xml
----- End Fatal Exception -------------------------------------------------
Complete
process id is 270 status is 65

The starterlog shows (/logs/StarterLog) for example:
06/04/19 10:47:43 (pid:8452) ** condor_starter (CONDOR_STARTER) STARTING UP
06/04/19 10:47:43 (pid:8452) ** /usr/sbin/condor_starter
06/04/19 10:47:43 (pid:8452) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
06/04/19 10:47:43 (pid:8452) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
06/04/19 10:47:43 (pid:8452) ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
06/04/19 10:47:43 (pid:8452) ** $CondorPlatform: x86_64_RedHat6 $
06/04/19 10:47:43 (pid:8452) ** PID = 8452
06/04/19 10:47:43 (pid:8452) ** Log last touched 6/4 10:47:42
06/04/19 10:47:43 (pid:8452) ******************************************************
06/04/19 10:47:43 (pid:8452) Using config source: /etc/condor/condor_config
06/04/19 10:47:43 (pid:8452) Using local config sources: 
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/10_security.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/14_network.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/20_workernode.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/30_lease.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/35_cms.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/40_ccb.config
06/04/19 10:47:43 (pid:8452)    /etc/condor/config.d/62-benchmark.conf
06/04/19 10:47:43 (pid:8452)    /etc/condor/condor_config.local
06/04/19 10:47:43 (pid:8452) config Macros = 172, Sorted = 172, StringBytes = 6941, TablesBytes = 6296
06/04/19 10:47:43 (pid:8452) CLASSAD_CACHING is OFF
06/04/19 10:47:43 (pid:8452) Daemon Log is logging: D_ALWAYS D_ERROR
06/04/19 10:47:43 (pid:8452) Daemoncore: Listening at <10.0.2.15:45925> on TCP (ReliSock).
06/04/19 10:47:43 (pid:8452) DaemonCore: command socket at <10.0.2.15:45925?addrs=10.0.2.15-45925&noUDP>
06/04/19 10:47:43 (pid:8452) DaemonCore: private command socket at <10.0.2.15:45925?addrs=10.0.2.15-45925>
06/04/19 10:47:44 (pid:8452) CCBListener: registered with CCB server vocms0840.cern.ch as ccbid 137.138.156.85:9618?addrs=137.138.156.85-9618#633281
06/04/19 10:47:44 (pid:8452) Communicating with shadow <137.138.52.94:4080?addrs=137.138.52.94-4080&noUDP&sock=4298_1468_19663>
06/04/19 10:47:44 (pid:8452) Submitting machine is "vocms0267.cern.ch"
06/04/19 10:47:44 (pid:8452) setting the orig job name in starter
06/04/19 10:47:44 (pid:8452) setting the orig job iwd in starter
06/04/19 10:47:44 (pid:8452) Chirp config summary: IO false, Updates false, Delayed updates true.
06/04/19 10:47:44 (pid:8452) Initialized IO Proxy.
06/04/19 10:47:44 (pid:8452) Done setting resource limits
06/04/19 10:47:46 (pid:8452) File transfer completed successfully.
06/04/19 10:47:46 (pid:8452) Job 150691.2 set to execute immediately
06/04/19 10:47:46 (pid:8452) Starting a VANILLA universe job with ID: 150691.2
06/04/19 10:47:46 (pid:8452) IWD: /var/lib/condor/execute/dir_8452
06/04/19 10:47:46 (pid:8452) Output file: /var/lib/condor/execute/dir_8452/_condor_stdout
06/04/19 10:47:46 (pid:8452) Error file: /var/lib/condor/execute/dir_8452/_condor_stderr
06/04/19 10:47:46 (pid:8452) Renice expr "10" evaluated to 10
06/04/19 10:47:46 (pid:8452) Using wrapper /usr/local/bin/singularity_wrapper.sh to exec /var/lib/condor/execute/dir_8452/condor_exec.exe ireid_TC_OneTask_IDR_CMS_Home_190526_125903_8237-Sandbox.tar.bz2 89269 0
06/04/19 10:47:46 (pid:8452) Running job as user nobody
06/04/19 10:47:46 (pid:8452) Create_Process succeeded, pid=8466
06/04/19 10:52:47 (pid:8452) Process exited, pid=8466, status=1
06/04/19 10:52:48 (pid:8452) Got SIGQUIT.  Performing fast shutdown.
06/04/19 10:52:48 (pid:8452) ShutdownFast all jobs.
06/04/19 10:52:48 (pid:8452) **** condor_starter (condor_STARTER) pid 8452 EXITING WITH STATUS 0


Does one of the experts know where the problem is located? According to the finished_7.log it looks like it can't find a valid site-local-config, but why?
Vbox cpu usage is ~0%. The affected host has successfully crunched CMS tasks in the past. I am using a local proxy, which should be working fine (at least the theory tasks have no problem using it).
11) Message boards : Theory Application : Issues Native Theory application (Message 38608)
Posted 23 Apr 2019 by gyllic
Post:
It rarely happens, but sometimes an error between all the valids.

After 1.5 hours runtime: Exit status 195 (0x000000C3) EXIT_CHILD_FAILED

https://lhcathome.cern.ch/lhcathome/result.php?resultid=220279871
same here. Until now, 3 out of ~100 failed with the same error as mentioned above:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=221596097
https://lhcathome.cern.ch/lhcathome/result.php?resultid=221500071
https://lhcathome.cern.ch/lhcathome/result.php?resultid=221484725

Any idea why that happens?
12) Message boards : ATLAS application : Changes to use configured proxy for Frontier servers (Message 38562)
Posted 12 Apr 2019 by gyllic
Post:
for native atlas the squid logs show connections to http://lcgft-atlas.gridpp.rl.ac.uk:3128 which is one of the frontier ATLAS servers. Seems like using the proxy is a good idea since it shows 100% hitrate for this site (not sure if that makes sense).
13) Message boards : Theory Application : Issues Native Theory application (Message 38267)
Posted 18 Mar 2019 by gyllic
Post:
"runc" was missing...
now it is installed ;)
runc is provided via CVMFS, so there should be no need for installing runc and this should not fix your problem. If it does, please report here.
Could you please post the output of the commands (hopefully they work on Scientific Linux)

cat /proc/sys/kernel/unprivileged_userns_clone
and
 cat /proc/sys/user/max_user_namespaces
14) Message boards : CMS Application : CMS 47.90 WU runs 18 hours, but do nothing after 12 ours of runtime. What can I do to use CPU for CMS more effectively? (Message 38243)
Posted 14 Mar 2019 by gyllic
Post:
It appears to me that this is mainly related to the first job that a BOINC task runs, suggesting to me that the workflow is trying to access a resource with limited network connectivity (or otherwise limited throughput).
Maybe using openhtc would help overcoming this issue?
15) Message boards : ATLAS application : ATLAS native_mt error while computing after 600 seconds (Message 37975)
Posted 11 Feb 2019 by gyllic
Post:
In order to run ATLAS native you must have installed two additional programs:
-cvmfs (https://cernvm.cern.ch/portal/filesystem)
-singularity ( https://www.sylabs.io/singularity/)

If you want to build these two programs from source code, you can go here (this has been written for debian): https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840

If you dont want to get native tasks, you should uncheck the box "Run test applications?" in the LHC preference page (https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project)
16) Message boards : ATLAS application : New WU with 2 output files (Message 37836)
Posted 27 Jan 2019 by gyllic
Post:
so far 2 out of 7 native v2.55 have validate errors.

In the logs it says e.g. "Moving ./HITS.16756652._013772.pool.root.1 to shared/HITS.pool.root.1" for every task (valid and invalid ones according to boinc server standards), which indicates that the tasks ran successfully, so it is propably "just" a boinc server validating problem.

I would not mind if not all tasks give credits for the moment but as djoser mentioned it is a waste of resources, since due to the validate errors, the tasks get send to another host although they already have produced good results (HITS file).
17) Message boards : Cafe LHC : CERN Open Days 2019 (Message 37636)
Posted 18 Dec 2018 by gyllic
Post:
For everyone who is interested, CERN opens its doors to the public in September 2019: https://home.cern/news/news/cern/cern-open-days-explore-future-us

quote:
Similar to the 2013 edition, the 2019 Open Days will give people the chance to discover our facilities both underground and on the surface**
18) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 37591)
Posted 11 Dec 2018 by gyllic
Post:
What is the output of the command (remember to probe cvmfs first):
sudo -H -u boinc singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname
19) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 37579)
Posted 9 Dec 2018 by gyllic
Post:
Thanks!

Hm, thats weird, since at a first glance I don't see anything why native ATLAS should not run on your Lubuntu machine.
Accroding to the "return 3" error code that is given by the ATLAS run script (see here https://lhcathome.cern.ch/lhcathome/result.php?resultid=211183308), it could not run the command "singularity exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname" or the command did not produce a valid output. But according to your posted debug log, this command produces an output: your hostname "opti7010"
Also, the "singularity --debug" command shows basically the same as from a machine that has already successfully crunched native ATLAS tasks.

I don't know why it is not working. Just a few guesses (might have nothing to do with the problem at all): maybe some user rights/privilege problems? Did you install singularity with "sudo make install"?

Maybe just try another native task?
20) Message boards : ATLAS application : Guide for building everything from sources to run native ATLAS on Debian 9 (Stretch) Version 2 (Message 37522)
Posted 4 Dec 2018 by gyllic
Post:
Now a problem with singularity. It built and installed OK and "singularity --version" reports
singularity version 3.0.1-145.gc9822fec.

To get more information, please do the following:
1. probe cvmfs:
cvmfs_config probe

2. Post the output of the command:
singularity --debug exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/images/singularity/x86_64-slc6.img hostname


Next 20


©2024 CERN