1)
Message boards :
Number crunching :
web preferences - queue tasks but run only 1
(Message 45223)
Posted 18 Aug 2021 by wolfman1360 Post: BOINC never worked that way. Best of luck. That resource setting, by the way, will take about 3 weeks to learn what you want. I'm still figuring out the best configuration for this project - it by far requires the most messing about with various settings to optimize runtimes vs. available memory and available cores. I think I have it now. |
2)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45221)
Posted 18 Aug 2021 by wolfman1360 Post: Hi, are you running SixTrack, virtually no I/O? Thanks. Eric I do have it selected, but there don't seem to be any tasks available currently. thanks |
3)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45201)
Posted 13 Aug 2021 by wolfman1360 Post: CMS - not tweaked? How can one tweak them and what will result? Thank you. Will give this a look. Think I will need to as I may need more than 3 preference types. |
4)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45195)
Posted 12 Aug 2021 by wolfman1360 Post: native Theory: CMS - not tweaked? How can one tweak them and what will result? thanks |
5)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45193)
Posted 12 Aug 2021 by wolfman1360 Post: Thank you for all of that. I'll have to play with settings and maybe app config to figure out the best solution. Maybe 2 12 core WUs from Atlas and the rest for CMS and theory. How much ram does native theory and CMS use? |
6)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45190)
Posted 12 Aug 2021 by wolfman1360 Post: I don't get many stuck VMs anymore.Great to hear! ATLAS is the most tricky to run, if you want to allow unlimited WUs then it tries to use 10GB of memory per WU, you can of course tweek this but then its hard to keep the memory usage on track manually. CMS runs smoothly.10 gb per wu? Is this for a single core workunit? What if I only select a certain number of them? Or select 1 wu to use 24 cores? I don't think there is much disk activity in general, peak total transfers are 38% peak writes are about 40 and 100 for reads.Thank you for those figures. How is bandwidth for CMS and Atlas? I could probably get away with running a lot more native theory tasks, I'm guessing. |
7)
Message boards :
Number crunching :
Beating the Heat
(Message 45188)
Posted 12 Aug 2021 by wolfman1360 Post: I have twelve machines, spaced to provide heating in the winter. I feel the same. The summer is where I generally figure out what machines to retire as well. In this case Nehalem EP, Bloomfield, some Sandy bridge. Electrical costs can almost make up for the work these new Ryzens do. |
8)
Message boards :
Number crunching :
Beating the Heat
(Message 45186)
Posted 12 Aug 2021 by wolfman1360 Post: I have most of my machines in the basement, or have rented a few dedicated servers. It makes up for the lack of electricity I'd ultimately spend with the same setup , at this point. I should really throw a few cores at climate prediction given what's going on in the world. It's been over 30 c here most days since the beginning of June. We've had maybe 2 inches of rain during that time. Best of luck. |
9)
Message boards :
ATLAS application :
Creation of container failed
(Message 45185)
Posted 12 Aug 2021 by wolfman1360 Post: You are quite welcome. Nothing is easy with native, especially singularity. As am I. I usually install the standard Ubuntu 20.04 along with any updates. I know I had major issues with debien. |
10)
Message boards :
Number crunching :
best practices/how to get most efficiency for higher core machines
(Message 45183)
Posted 11 Aug 2021 by wolfman1360 Post: Hello I've been out of the loop on this project for a while and decided to give it another go. I have nearly 200 cores to add, so just want to make sure I'm not going to bungle this with a lot of errors or stuck vms. I've got several 24 to 48 thread Xeons with anywhere from 32-64 gb ram, as well as a few 8 thread i7's with 16-32, all running Linux. Squid will be used, but my biggest concern is ram and stuck vms. I know Theory takes the least ram per WU - but for best efficiency, what would folks recommend for managing Atlas and CMS workunits? Can Boinc be trusted to manage ram on its own? Still trying to figure out number of workunits vs. number of CPUs, which I believe the latter only has to do with Atlas. Most of these, apart from a few, are running cheap ssds since I figure a lot of disk activity will be going on and with a lot of WUs crunching at the same time that might be a factor in how fast they start/stop, especially in regards to CMS. I'm not sure what the process is for each one to start and finish. Right now the goal is to add machines very slowly, making sure each one can crunch Atlas, Theory, and CMS with one WU sent of each before moving onto the next. Examples of processors and ram configurations - e5-2670v3 with 64 gb, e5-2680 with 32. If I remember right CMS and Atlas are the biggest users of bandwidth and disk? thanks and any help appreciated! |
11)
Message boards :
ATLAS application :
Creation of container failed
(Message 45182)
Posted 11 Aug 2021 by wolfman1360 Post: sudo apt-get install singularity First off - thank you so much. We're currently well above 10 minutes of CPU time, so it appears to be working. This was all very much above my head and I simply copy and pasted a lot of this with the substitutions of version numbers and directories. Hopefully nothing goes wrong in the future as I am still very much new to Linux for the most part. hopefully this, along with a few other things, can be pinned. I'm finding a lot of things are very scattered - especially as they relate to native - and there are a lot of different instructions that are now out of date or do not work for this project. For reference, this is what just worked for me. I still have a few questions, but that's for another topic. sudo apt remove singularity sudo apt autoremove (not sure if this is needed, but it didn't break anything, yet) sudo apt-get update && sudo apt-get install -y \ build-essential \ libssl-dev \ uuid-dev \ libgpgme11-dev \ squashfs-tools \ libseccomp-dev \ wget \ pkg-config \ git \ cryptsetup To correct broken packages (examples): sudo apt install libseccomp2=2.4.3-1ubuntu1 sudo apt install libssl1.1=1.1.1f-1ubuntu2 Second: install GO sudo snap install go --classic Result: go 1.16.6 from Michael Hudson-Doyle (mwhudson) installed Check version: go version go get -d github.com/hpcng/singularity export VERSION=3.8.0 && \ wget https://github.com/hpcng/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \ tar -xzf singularity-${VERSION}.tar.gz && \ cd singularity-3.8.0 ./mconfig && \ make -C ./builddir && \ sudo make -C ./builddir install Good luck and thank you again. |
12)
Message boards :
ATLAS application :
Creation of container failed
(Message 45180)
Posted 11 Aug 2021 by wolfman1360 Post: Having the same issue here. it has been a while. Steps ran: Installed cvmfs. create default.local sudo apt-get install singularity sudo apt-get install squashfs-tools Ubuntu 20.04. Plenty of ram - set to run 1 Atlas task, machine has 16 gb ram. Initially thought there was an error while changing number of CPUs per task. Task in question: https://lhcathome.cern.ch/lhcathome/result.php?resultid=323595231 Has anyone found a working fix for this? |
13)
Message boards :
ATLAS application :
A few errors, tried everything I can
(Message 44457)
Posted 7 Mar 2021 by wolfman1360 Post: Thank you. I did sudo apt-get install singularity, so we'll see what happens. It seems as though most of this is well above my head and knowledge. Or maybe it's simply paid work followed by 'free work and troubleshooting' as it were. Things still don't appear to be functioning correctly. I do not appear to be out of memory. Have switched over to Rosetta, for now, until I can troubleshoot this and get it functional. Will start with one machine at a time. I may reinstall Ubuntu 20.04 from scratch and go through everything again. At this point I'm not sure what else to try. The most frustrating part - I have not done anything different on the machines that are able to complete these tasks versus the ones that are not, to my knowledge, so I'm at a loss as to where to begin. |
14)
Message boards :
ATLAS application :
A few errors, tried everything I can
(Message 44449)
Posted 5 Mar 2021 by wolfman1360 Post: Thank you. I did sudo apt-get install singularity, so we'll see what happens. Thanks so much. I'll give this a try if the above method fails. Bought two year old but new Ryzen 5 3600's. They'll be a marked improvement over what I have currently, especially in the summer. |
15)
Message boards :
ATLAS application :
A few errors, tried everything I can
(Message 44437)
Posted 5 Mar 2021 by wolfman1360 Post: have the same info from CentOS8. Thank you. I did sudo apt-get install singularity, so we'll see what happens. thanks |
16)
Message boards :
ATLAS application :
A few errors, tried everything I can
(Message 44432)
Posted 4 Mar 2021 by wolfman1360 Post: Hi, A few errors on multiple machines, Ubuntu 20.04. [2021-03-04 08:18:44] [31mFATAL: [0m while extracting /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img: root filesystem extraction failed: extract command failed: exit status 1 https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481909 And this one. [2021-03-04 13:04:23] [31mFATAL: [0m container creation failed: mount ->/var error: can't remount /var: operation not permitted https://lhcathome.cern.ch/lhcathome/result.php?resultid=304481865 No idea how to go about fixing these. |
17)
Message boards :
Theory Application :
Read-only file system error on native Theory
(Message 44423)
Posted 1 Mar 2021 by wolfman1360 Post: Getting tasks like this one. https://lhcathome.cern.ch/lhcathome/result.php?resultid=303571058 No clue what to make of this. Debien 10 on an i7-4770. Have ran the following as outlined. sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf sudo sysctl -p Still getting user namespace errors nonetheless, as well as the above. Recommendations appreciated. |
18)
Message boards :
Number crunching :
Optimal CPU usage?
(Message 44419)
Posted 28 Feb 2021 by wolfman1360 Post: One final question because this is confusing me a little. I've got a few Xeon e5's with 32-40 threads, but only 32 GB of ram. I'm reading on the forum folks using app_configs to limit CPU and concurrent tasks. Concurrent tasks for subproject makes sense, currently you can only limit the number of actual jobs in progress and I'm assuming this is inclusive of all subprojects selected under preferences, , but is there any reason to limit CPU via this method as well as on the website? Am I to assume the project is smart enough - if 6 4-core Atlas tasks ran on an e5-2680 that it would populate the rest of the machine with single threaded tasks, provided enough memory was available? thanks |
19)
Message boards :
Number crunching :
Optimal CPU usage?
(Message 44415)
Posted 27 Feb 2021 by wolfman1360 Post: CMS tasks take about 2500 MB memory if I remember correctly. But they always use only 1 CPU core so not much you can do there. Their run time (if everything is working properly) is 12...18 hours. They run so that when 12 hours is full, they look for point where the currently running job is finished and finish the task there. If the task is not finished at 18 hours they will be terminated by Boinc. Thank you, that's a great help. Will boinc automatically insure that ram doesn't get overused and wait for memory to free up before resuming tasks, or should I create an app_config file? |
20)
Message boards :
Number crunching :
Recommended CVMFS Configuration for Native Apps - Comments and Questions
(Message 44414)
Posted 27 Feb 2021 by wolfman1360 Post: Perhaps that file could be updated with the minimum needed configuration ... How does one go about cacheing as much as possible? Not sure what happened in my case, then, since as soon as I downloaded https://lhcathome.cern.ch/lhcathome/download/default.local -O /etc/cvmfs/default.local I got immediate failures after probing. Running the listed items in the how to fixed my issues, and I believe I also added the line containing openhtc.io. Thank you for the help and excellent clarification. |
©2025 CERN