Posts by lazlo

1) Questions and Answers : Unix/Linux : Privilege escalation bug discovered in sudo. Check your configuration and your distro. (Message 41482) Posted 5 Feb 2020 by lazlo_vii Post: https://www.theregister.co.uk/2020/02/05/sudo_bug_allows_privilege_escalation/
2) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41088) Posted 27 Dec 2019 by lazlo_vii Post: I'm not understanding your last point, but then I'm not a solid Linux user and a lot of things are still over my head - I'm assuming what you mean is the project was calling too much of the CPU into action and it couldn't keep up. Is there a big difference in which tasks you run vs. credits? For instance does Atlas pay more than sixtrack? How are your Ryzen's configured? My machines are 16-32 gb ram with my Ryzen 7 having 64 gb total, so it should be able to handle 7 2-core Atlas's just fine with my other i7's having 1 or 2 core Atlas's running depending on 16 or 32 gb of ram and 75% CPU usage. It's a quiet Christmas eve here, so I'm currently throwing together an old Xeon w3520 I found in the parts drawer. I think I have an i7 from the same time period here somewhere...will be two good space heaters in my office in the cold Canadian winter, I just need to find a cooler for the latter and see how my electric bill fairs. No cases on either of these - not enough room, ironically. They will of course both run Linux - my main desktop and laptop will continue running Windows, at least for now, while I slowly but surely navigate around the Linux command line and learn how to break things, then fix them. Old as these are every little bit will help is my motto. Here is a short history of load averages in Linux. It has two things going for it: First, it is very well written, has a simple to understand summary on the first page, and provides concrete examples with code snippets later on. Second, it was the very first article that came up when I did a Google search for "Unix load average explained". http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
3) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41062) Posted 24 Dec 2019 by lazlo_vii Post: ...I do have other machines that are in datacenters and internet isn't an issue. One day that 300/25 will come my way and I can be happy. And maybe a few more Ryzens too. I'm aiming for theoretically using up around 75% of each machine if at all possible. Should I be aiming at more or less? Basically 2 threads not being used on an 8 threads machine and similar on 16 - though might bump that up to 3 not in use. I started LHC@home with one Ryzen 3700X (my desktop) running at 25-75% load for 10 days straight and then added a second (my server) running at 75%-100% load. 15 days after that (25 days total) I had one million points for this project. So yes, running at less than 100% is very do-able especially if you have lots of threads, RAM, and need your systems for other tasks. EDIT: I forgot to add that when running the server at 75% the "load average" (how many treads were asking my kernel for a CPU at once) was a little less than 13 on average. When I turned LHC up to 100% the load average would quickly spike into the low 20's and stay there. That's not good on a system that can only handle 16 threads.
4) Message boards : Number crunching : Bandwidth and ram for vb and native tasks? (Message 41040) Posted 23 Dec 2019 by lazlo_vii Post: The RAM usage is well known but I think the bandwidth is not. You can get a total daily transfer history from BOINC on the command line: boinccmd --get_daily_xfer_history But it will not be broken down by project and/or sub-project. It will just show the daily transfer totals for that host. If you have a proxy server running for your BOINC clients you might be able to save a lot of network traffic with it on the native workloads.
5) Message boards : ATLAS application : Uploading stuck (Message 40995) Posted 18 Dec 2019 by lazlo_vii Post: First, I would look at your /etc/boinc-client/cc_config.xml and double check the network settings. I am not saying "It isn't plugged in!" but that should be the first question you answer for yourself. If all is good in your config file I would try to manually update the project from the command line in one X terminal while watching the boinc-client messages in another. First open a terminal on (or to) the host and issue: watch -n1 boinccmd --get_messages That will terminal will refresh messages from the boinc-client service until you hit ctrl+c to kill watch. Open a second terminal on (or to) the same host and issue which ever one these two commands matches your configuration: boinccmd --project https://lhcathome.cern.ch/lhcathome/ update or boinccmd --host localhost --passwd <your_password_for_remote_access> --project https://lhcathome.cern.ch/lhcathome/ update Switch back to the first terminal and read what boinc-client says about updating. If that doesn't give you useful information you can try looking at /var/log/syslog and reading man nc, man netstat, and man boinccmd for more clues. Router logs might be useful to you as well. EDIT: The updating of the project and reading boinc-client's messages can be done easily from the GUI, but what fun is that?
6) Questions and Answers : Sixtrack : No more tasks? (Message 40991) Posted 17 Dec 2019 by lazlo_vii Post: I have been getting the same message in my logs since yesterday. EDIT: I just saw this: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5116&postid=40990#40990
7) Message boards : Number crunching : Max # jobs and Max # CPUs (Message 40970) Posted 15 Dec 2019 by lazlo_vii Post: ...run "sudo chmod -R 777 /var/lib/boinc-client"... Friends do not let Friends chmod 777. Don't do it. It is dangerous.
8) Message boards : Cafe LHC : Happy Xmas LHC@home! (Message 40957) Posted 14 Dec 2019 by lazlo_vii Post: I hope everyone has a safe and happy Holiday over the next few weeks. https://www.youtube.com/watch?v=S2KaRVGx0yI
9) Message boards : ATLAS application : Potentially failing vbox tasks today (Message 40947) Posted 13 Dec 2019 by lazlo_vii Post: I have 6 Atlas (4 CPU) task running on two systems right now. In all 6 athena.py stopped running simultaneously. My systems went from a load average of 12+ to just 1+ for several minutes. I stopped boinc-client, autofs, and squid. I brought up squid, autofs, and boinc-client and saw no change. According to boinc-manager all tasks were running, but they had not written check points. After a few minutes of watching athena.py trying to restart itself several times everything returned to normal. I don't know if these tasks will fail or turn out to be invalid or not. I think The Grinch has invaded CERN. EDIT: I should add that these are NOT vbox tasks, these are Altas Native.
10) Message boards : Number crunching : My idle thoughts on my idle cores...is this correct? (Message 40896) Posted 11 Dec 2019 by lazlo_vii Post: You mention upgrading your processors. How old were your motherboards and what speed RAM is supported. You may see a slight performance upgrade depending on your memory speeds. Have you taken a look at that? I didn't run LHC@Home on my old hardware.
11) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40895) Posted 11 Dec 2019 by lazlo_vii Post: Users and groups can share names across multiple systems even if they have different UID's and GID's locally. Perhaps that is part of the issue? EDIT: Now that I think about a bit more isn't "boinc" a system group? If so you shouldn't be able to access it's data. See "man pam", "man shadow", and "man login.defs" for more info.
12) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40836) Posted 7 Dec 2019 by lazlo_vii Post: The local BOINC client will simply ignore xml tags that are not defined for app_config.xml. Among those ignored tags are: <maintain>18</maintain> <priority>1</priority> Duh. So invent them. It is an open source project. If you would like to invent them you are able to do so. It is much less work to learn and use new management tools though. Like I said before, it is easy to use LXD once you learn. I do not intend to be rude but I have to say that you are in the top 1 percent of contributors and I know that must be a work of it's own. Why not work a little harder for a short period of time so you will not have to work so hard for years to come? Why must the software developers appease your desire not to work when they also do not desire to work?
13) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40829) Posted 7 Dec 2019 by lazlo_vii Post: The cernvm-fs client yes not server one. Cache is set on default and would show CVMFS_QUOTA_LIMIT=4000 in config and CVMFS_CACHE_BASE=/var/lib/cvmfs with 37MB in total with setup files included. This looks fine to me and i check with cvmfs_config showconfig which post all lines with parameters. None of these lines mention the filesystem on that got mounted during operational. I located /tmp on local host as this folder have high storage in use, the rootfs folder are used to atlas and issue would be that these folders increase and not get wiped after completed. My thought would be that filesystem are re-used or they failed to to get removed. A host with 250GB running boinc only have hard limit at 100GB but normal operation with mix of project aty 20-30GB this include a few atlas running. When host run for several weeks of month without restart it looks like it would suffer on of disk full. It would result in boinc-client crash as system disk get full boinc limit is fine. Cvmfs filesystem is not included in boinc data so spin until system get full. The system does not handle this rootfs folders as might get correct info to do so. If I understand what CVMFS does, it just mounts a remote filesystem locally. Like an advanced version of NFS exporting / of the host. Because it is remote filesystem it isn't writing anything in /cvmfs to your local system. If the files are written anywhere other than /cvmfs they would take up space on the local system. In that case if you are running distro that use systemd check: man systemd-tmpfiles man tmpfiles.d As per https://askubuntu.com/questions/1086034/which-process-cleans-tmp-under-systemd-on-18-04lts-answered-here-no-duplicat
14) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40828) Posted 7 Dec 2019 by lazlo_vii Post: I thought nT was in production but it's limited to a fearful 10 WUs per rig. Because of the RAM-hungry ATLAS WUs I have to run ST to fill out my threads. I think BOINC runs best with fewer projects but stuck with running three now. Since you have your computers hidden in your profile I do not know if this will apply to you but: On computers with lots of cores it might be worth to set up additional BOINC client instances. It's really easy to do this on a Linux system that can run LXD. You can limit the ram and/or CPU threads for each container manually or by using profiles. Even though the following information is a few years old it is still good. It is written by the lead developer of LXD: https://stgraber.org/2016/03/11/lxd-2-0-blog-post-series-012/ I hope that is useful to you.
15) Message boards : Cafe LHC : Riding on SixTrack (Message 40804) Posted 6 Dec 2019 by lazlo_vii Post: I have done two further revisions of my first work. Both are in 1440p instead of 1080p and should scale better on 4k displays. In addition to making slight changes to the colors used in the rendering I added a radial gradient to the second image so that is is not as bright on the edges. It should work better with dimmer desktop themes if you use it as background. Larger, new colors, bright: https://highlander.motleyrebellion.com/index.php/s/M4j3Li8Eb2t7gFJ Larger, new colors, darker: https://highlander.motleyrebellion.com/index.php/s/yCSfN39LK235C2R
16) Message boards : Number crunching : My idle thoughts on my idle cores...is this correct? (Message 40788) Posted 5 Dec 2019 by lazlo_vii Post: "I know something about most things, I know most things a few some things, and I know everything about nothing." I don't remember where that saying came from, but it has stuck with me for many years. If you have any thoughts or knowledge that could enlighten me please post them. My two main systems have now been upgraded to Ryzen 3700X CPUs and are both running LHC@Home with 12 of 16 threads possible threads. I set it this way so that I don't have to stop Boinc when I want to play a game on my desktop, or run a VM on my server, or for what ever other reason I might want to use the idle parts of my CPUs. I know that there are two somewhat conflicting management schemes at play. The first is on the hardware level where the BIOS and CPU are juggling threads between CPUs for thermal management reasons (You can see this in action just by watching htop for a minute). The second is the Linux kernel of the host system trying to keep all of the processes that use the same data close together to avoid L2 cache misses. This is important because the kernels in the virtual environments of the different work units will only see the virtual CPUs and not the fact that they are being bounced around the physical CPU for thermal management reasons. Because of the first management scheme, this means that I can never have truly idle cores without using a program to set the CPU affinity of Boinc. Because of the second management scheme if I did use a CPU affinity utility the host kernel would have a much easier time avoiding cache misses and also reduce the chances that my own activities on the system will interfere with Boinc. This would come at the cost of decreased thermal efficiency. Since I have set the BIOS on these two systems to use AMD's Eco Mode it is a risk that I think has been mitigated. So I am thinking of installing and using a utility to manage the CPU affinity of Boinc. I know that the kernel sees threads strictly by their PID number. Also the PID number changes every time a new work unit is started or when a new thread is spawned in a VM. If I don't want to manage the constantly changing PIDs then I think the best solution would be to find a utility that allows me to set CPU affinity by the name of the user that owns the process on the host system (boinc). This could be done by taking advantage of cgroups. If all of the above is correct then I what I need to do is find the right utility and learn how AMD numbers the physical and logical cores in the CPU. After that I can deduce (by trial and error) which affinity settings give the most consistent performance and thermal results. If all goes well I could learn something new and have slightly more productive systems. If it goes bad I could generate a lot of failed work units. So, do you think it would be worth effort?
17) Message boards : Cafe LHC : Riding on SixTrack (Message 40759) Posted 2 Dec 2019 by lazlo_vii Post: I wanted to share a bit of LCH@Home inspired artwork. It's what it might be like if you could ride a particle on a beam: https://highlander.motleyrebellion.com/index.php/s/gJowmi4akaxNmWn
18) Message boards : Number crunching : Ryzen 7 3700X numbers. (Message 40757) Posted 2 Dec 2019 by lazlo_vii Post: I have two nearly identical computers. The have the same models of CPU, RAM, and motherboards with the same BIOS settings (Eco mode is turned on which lowers the TDP of each CPU from 65W to 45W) and the same OS (Ubuntu 18.04). They are both set to run BOINC 24 hours a day, 7 days a week. Where they differ is what software I have installed, how I use them, and how BOINC and LHC are configured. Host 1 (laz-ubtop) is my destop system. Almost all of the software installed on it comes from the Ubuntu repositories with the following exceptions: Steam and the games I play through Steam, a few games I bought through The Humble Store, and Singularity. I had to update Singularity by compiling the source code because some Atlas tasks were failing because (I think) the disk images were made with a newer version than the one in the Ubuntu repository. I have this system set to accept any and all work task from all LHC projects. I have it set to use 12 "cores" and until a few minutes ago it was set to use up to 8 "cores" per task. I just changed it that 4 "cores" because running tasks kept getting suspended so the Atlas task could start. Host 2 (rebelhostess-v2) is my home server. With it I run my NextCloud install, my squid proxy server (only used for LCH), NFS shares, hostapd for my wifi, the occasional VM if I want to explore what's happening with other distros or BSD. I also have the this system set to use 12 "cores" but it only does Atlas native for LHC. It also has a LXD container with GPU pass-through running Einstein@Home. According boincstats.com Host 2 is outperforming Host 1 by about 10K points a day: https://www.boincstats.com/stats/-1/host/list/0/0/bb6dd2820ecefbe946a788a41fe15d11 Are the Atlas task really worth so much more scientifically? If so should I go pure Atlas on my desktop as well? I don't really care about the points. I just want to help with the science. If the point values are not a reflection of the science value that's OK, too. I am just ignorant about it all works and I know that questions can lead to answers and answers are the only real cure for ignorance. EDIT: OK, so I just realized where my error was. Boincstats isn't breaking down the numbers by project and is including my Einstein@Home points with my LHC numbers.
19) Message boards : Theory Application : Estimated Remaining Time Well Past Scheduled Due Date (Message 40630) Posted 24 Nov 2019 by lazlo_vii Post: Right now I have 3 Theory Native 1.01 tasks that have been running more than a day. They are in slots 1,2, and 8. Slot 1 properties from Boinc Manager: Application Theory Native 1.01 (native_theory) Name TheoryN_2279-770870-156 State Running Received Thu 21 Nov 2019 12:25:33 PM CST Report deadline Sun 01 Dec 2019 12:25:31 PM CST Estimated computation size 3,600 GFLOPs CPU time 1d 06:52:19 CPU time since checkpoint 1d 06:52:19 Elapsed time 1d 05:28:37 Estimated time remaining 00:00:00 Fraction done 99.999% Virtual memory size 528.60 MB Working set size 50.60 MB Directory slots/1 Process ID 3199 Progress rate 3.240% per hour Executable wrapper_2019_03_02_x86_64-linux Tail from runRivet.log: $ sudo tail -f /var/lib/boinc-client/slots/1/cernvm/shared/runRivet.log Updating display... Display update finished (127 histograms, 72000 events). Updating display... Display update finished (127 histograms, 72000 events). Updating display... Display update finished (127 histograms, 72000 events). Updating display... Display update finished (127 histograms, 72000 events). Updating display... Display update finished (127 histograms, 72000 events). Updating display... Display update finished (127 histograms, 72000 events). Slot 2 properties from Boinc Manager: Application Theory Native 1.01 (native_theory) Name TheoryN_2279-750936-155 State Running Received Thu 21 Nov 2019 12:25:33 PM CST Report deadline Sun 01 Dec 2019 12:25:31 PM CST Estimated computation size 3,600 GFLOPs CPU time 1d 06:27:03 CPU time since checkpoint 1d 06:27:03 Elapsed time 1d 05:31:57 Estimated time remaining 00:00:00 Fraction done 100.000% Virtual memory size 600.85 MB Working set size 63.33 MB Directory slots/2 Process ID 3201 Progress rate 3.240% per hour Executable wrapper_2019_03_02_x86_64-linux Tail from runRivet.log: $ sudo tail -f /var/lib/boinc-client/slots/2/cernvm/shared/runRivet.log 3.8625e+14 pb +- ( 1.00728e+14 pb = 26.0785 % ) 772720000 ( 772720138 -> 99.9 % ) integration time: ( 1d 6h 4m 1s elapsed / 851d 1h 41m 25s left ) [14:51:05] 3.8624e+14 pb +- ( 1.00725e+14 pb = 26.0785 % ) 772740000 ( 772740138 -> 99.9 % ) integration time: ( 1d 6h 4m 5s elapsed / 851d 2h 21m 49s left ) [14:51:09] 3.8623e+14 pb +- ( 1.00723e+14 pb = 26.0785 % ) 772760000 ( 772760138 -> 99.9 % ) integration time: ( 1d 6h 4m 7s elapsed / 851d 2h 50m 55s left ) [14:51:11] 3.8622e+14 pb +- ( 1.0072e+14 pb = 26.0785 % ) 772780000 ( 772780138 -> 99.9 % ) integration time: ( 1d 6h 4m 11s elapsed / 851d 3h 32m 34s left ) [14:51:15] 3.8621e+14 pb +- ( 1.00718e+14 pb = 26.0785 % ) 772800000 ( 772800138 -> 99.9 % ) integration time: ( 1d 6h 4m 15s elapsed / 851d 4h 14m left ) [14:51:19] 3.862e+14 pb +- ( 1.00715e+14 pb = 26.0785 % ) 772820000 ( 772820138 -> 99.9 % ) integration time: ( 1d 6h 4m 18s elapsed / 851d 4h 55m 53s left ) [14:51:22] Slot 8 properties from Boinc Manager: Application Theory Native 1.01 (native_theory) Name TheoryN_2279-750240-149 State Running Received Thu 21 Nov 2019 12:25:33 PM CST Report deadline Sun 01 Dec 2019 12:25:32 PM CST Estimated computation size 3,600 GFLOPs CPU time 1d 06:29:32 CPU time since checkpoint 1d 06:29:32 Elapsed time 1d 05:34:16 Estimated time remaining 00:00:00 Fraction done 100.000% Virtual memory size 600.85 MB Working set size 63.72 MB Directory slots/8 Process ID 3200 Progress rate 3.240% per hour Executable wrapper_2019_03_02_x86_64-linux Tail from runRivet.log: $ sudo tail -f /var/lib/boinc-client/slots/8/cernvm/shared/runRivet.log 4.63146e+15 pb +- ( 1.72895e+15 pb = 37.3305 % ) 773880000 ( 773880400 -> 99.9 % ) integration time: ( 1d 6h 6m 16s elapsed / 1747d 9h 55m 55s left ) [14:53:11] 4.63134e+15 pb +- ( 1.7289e+15 pb = 37.3305 % ) 773900000 ( 773900400 -> 99.9 % ) integration time: ( 1d 6h 6m 20s elapsed / 1747d 11h 23m 54s left ) [14:53:15] 4.63122e+15 pb +- ( 1.72886e+15 pb = 37.3305 % ) 773920000 ( 773920400 -> 99.9 % ) integration time: ( 1d 6h 6m 24s elapsed / 1747d 12h 45m 50s left ) [14:53:18] 4.63111e+15 pb +- ( 1.72881e+15 pb = 37.3305 % ) 773940000 ( 773940400 -> 99.9 % ) integration time: ( 1d 6h 6m 27s elapsed / 1747d 14h 11m 2s left ) [14:53:22] 4.63099e+15 pb +- ( 1.72877e+15 pb = 37.3305 % ) 773960000 ( 773960400 -> 99.9 % ) integration time: ( 1d 6h 6m 31s elapsed / 1747d 15h 35m 17s left ) [14:53:26] Updating display... Display update finished (0 histograms, 0 events). 4.63087e+15 pb +- ( 1.72872e+15 pb = 37.3305 % ) 773980000 ( 773980400 -> 99.9 % ) integration time: ( 1d 6h 6m 34s elapsed / 1747d 17h 1s left ) [14:53:30] 4.63075e+15 pb +- ( 1.72868e+15 pb = 37.3305 % ) 774000000 ( 774000400 -> 99.9 % ) integration time: ( 1d 6h 6m 37s elapsed / 1747d 18h 7s left ) [14:53:32] 4.63063e+15 pb +- ( 1.72864e+15 pb = 37.3305 % ) 774020000 ( 774020400 -> 99.9 % ) integration time: ( 1d 6h 6m 39s elapsed / 1747d 18h 52m 6s left ) [14:53:35] It seems that slot 1 has stalled but slots 2 and 8 want to run for a few more years. Is it OK to abort?
20) Questions and Answers : Unix/Linux : Ubuntu 18.04: cvmfs does not auto mount (Message 40565) Posted 21 Nov 2019 by lazlo_vii Post: ... mount the filesystems through fstab is ... better... Sorry to be direct. It's a very bad idea to do this via fstab. Did you try the hints on this page? https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html#configure-autofs https://cvmfs.readthedocs.io/en/stable/cpt-quickstart.html#troubleshooting In addition ensure the servise is enabled: systemctl enable autofs.service Some systems require a reboot to activate the automounter. Why did you config geant4, na61, boss...? At least 1 required repo is not in your list: atlas-nightlies.cern.ch It's usually configured automatically by the ATLAS scripts but this requires a working automounter... Keep your configuration lean and use this list in /etc/cvmfs/default.local: CVMFS_REPOSITORIES="atlas.cern.ch,atlas-condb.cern.ch,grid.cern.ch,cernvm-prod.cern.ch,sft.cern.ch,alice.cern.ch" The autofs.service was enabled. It just isn't working with cvmfs. Why did I add the mount points that I did? Because I pulled the list from another guide in this forum. It was either that or wait for a job to download and fail and then check the log of the task on this site and see what needed to be mounted. That would have wasted my time and CERN's. I would love to have "lean config" but I don't want to chase down the cause of a problem that shouldn't be happening at all. Especially since the docs and trouble shooting info are spread out all over this site instead of being maintained in a central location and curated to prune obsolete information. Maybe tomorrow I'll try to get cvmfs to write to syslog. That would be a nice thing to have if it actually logs anything useful.

Next 20

LHC@home