1) Message boards : Theory Application : Theory native tasks crash on wsl2. (Message 47962)
Posted 3 Apr 2023 by AndreyOR
Post:
Have you enabled systemd on WSL2? I've noticed this problem when systemd is enabled but don't know why it happens or a possible solution besides disabling systemd. I posted a question about it but never got any responses. Probably a uniquely WSL2 problem.
2) Message boards : Number crunching : Recommended CVMFS Configuration for Native Apps - Comments and Questions (Message 47588)
Posted 14 Dec 2022 by AndreyOR
Post:
Has anyone seen a scenario where running cvmfs_config probe returns success but native ATLAS fails due to such a probe failure when ran by the wrapper at task startup? What could be some possible reasons?
For context, WSL2 now supports systemd. I enabled it to see how things go with the hope of being able to run ATLAS multi-core on WSL2. It didn't go well for ATLAS as the probe failed according to the error log, however, it succeeds when I run it in isolation from the command prompt.
3) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47516)
Posted 10 Nov 2022 by AndreyOR
Post:
It turns out MadGraph tasks are more common than I realized. I found a few more in the Valid list. That last one was an odd one in that it prevented anything else from running. Others haven't done that. They do use more than one CPU thread as their CPU time is greater than run time and they're long-runners, taking over 24 hrs. to complete.
4) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 47500)
Posted 5 Nov 2022 by AndreyOR
Post:
Tried it with Squid 5.2 from Ubuntu 22.04 repository and it worked, ATLAS uploads went through ok. I wonder what was different with prior Squids that they didn't need this flag?
5) Message boards : ATLAS application : How do I set up CVMFS Docker + BOINC Docker? (Message 47499)
Posted 5 Nov 2022 by AndreyOR
Post:
I don't know anything about Docker but here are a couple of things to possibly try, from issues I had when first trying to get ATLAS to run on Ubuntu in WSL2.

Try installing Apptainer (formerly Singularity) directly instead of using the version pre-packaged with CVMFS. https://apptainer.org/docs/

Try running ATLAS single core. I still can't run it multi-core in WSL2. Single core takes the longest but it's an efficient use of resources as there's significant time spent during set-up and wrap-up which are single core processes so the other cores are sitting idle.
6) Questions and Answers : Unix/Linux : Setting up cvmfs (necessary for Cern experiments)(Linux)(mac also) Also if you have a boinc Ubuntu VM or docker VM (Have at least 8gb of dynamic ram available & swap if possible Set in the VM) (Message 47498)
Posted 5 Nov 2022 by AndreyOR
Post:
using the LocutusOfBorg

Mate, that is your problem, Borg are untrustworthy and Borg Queen herself controls Locutus. :-)

I also had issues going from Ubuntu 20.04 to 22.04.
7) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47497)
Posted 5 Nov 2022 by AndreyOR
Post:
It's annoying that the task finished before I had a chance to figure out how to make other tasks run alongside. I was going to try an app_config modification. I already found that you can manually make other tasks run by suspend/resume the MadGraph one momentarily but once those tasks finished new ones wouldn't start on their own.

My complaint was that other tasks wouldn't run alongside MadGraph. The 2-core limit not being respected was a possible reason I was thinking before knowing how and what to look for. I have no resource restrictions on BOINC settings. I did notice that MadGraph does take a lot of RAM, ~7.3 GB for that one, I believe it's the most I've ever seen for a BOINC task of any project I've ran. I have 12 GB allocated to WSL so it shouldn't have prevented other tasks from starting. I believe BOINC will suspend tasks saying "waiting for memory" if it detects that there's not enough.

Is it likely a BOINC issue or MadGraph issue? Does 2 processes limit necessarily mean 2-core limit or could those 2 processes be using all available cores? I think I may have seen evidence for the latter.

I wish this task ran a couple of days to get more time to try to figure things out. I didn't expect it to be done this soon otherwise I could've restarted it. I wonder if there's a way to get another one?
8) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47486)
Posted 4 Nov 2022 by AndreyOR
Post:
Yes, BOINC is only running 1 Theory task even though it should be running 8.

The output is:
runc(6647)─┬─job(6771)───runRivet.sh(6912)─┬─rivetvm.exe(10046)
           │                               ├─rungen.sh(10045)───python(10405)───python(10409)─┬─ajob1(8326)───madevent_mintMC(8335)
           │                               │                                                  ├─ajob1(8351)───madevent_mintMC(8360)
           │                               │                                                  ├─{python}(10476)
           │                               │                                                  ├─{python}(23250)
           │                               │                                                  ├─{python}(23251)
           │                               │                                                  ├─{python}(25059)
           │                               │                                                  ├─{python}(25060)
           │                               │                                                  ├─{python}(24020)
           │                               │                                                  └─{python}(24021)
           │                               └─sleep(8418)
           ├─{runc}(6676)
           ├─{runc}(6678)
           ├─{runc}(6679)
           ├─{runc}(6687)
           ├─{runc}(6692)
           ├─{runc}(6706)
           ├─{runc}(6715)
           └─{runc}(6721)

Are what you're looking for the 2 madevent_mintMC entries on far right at top of tree? They were a bit truncated in the first output.

I haven't had issues with stability or noticeable slowdowns by allowing WSL to use all cores, except when RAM gets filled up, e.g. when running a bunch of concurrent ATLAS tasks. The i7 PC I pretty much only use for BOINC.
9) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47484)
Posted 4 Nov 2022 by AndreyOR
Post:
The computer is: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10816268
The task is (still running): https://lhcathome.cern.ch/lhcathome/result.php?resultid=368455572
The output is:
init(1)─┬─init(10)─┬─automount(97)─┬─{automount}(98)
        │          │               ├─{automount}(99)
        │          │               └─{automount}(102)
        │          ├─boinc(135)─┬─wrapper_2019_03(360)─┬─cranky-0.0.32(376)───runc(6647)─┬─job(6771)───runRivet.sh(6912)─┬─rivetvm.exe(10046)
        │          │            │                      │                                 │                               ├─rungen.sh(10045)───python(10405)───python(10409)─┬─ajob1(2185)───madevent_mintMC(21+
        │          │            │                      │                                 │                               │                                                  ├─ajob1(2544)───madevent_mintMC(25+
        │          │            │                      │                                 │                               │                                                  ├─{python}(10476)
        │          │            │                      │                                 │                               │                                                  ├─{python}(23250)
        │          │            │                      │                                 │                               │                                                  ├─{python}(23251)
        │          │            │                      │                                 │                               │                                                  ├─{python}(25059)
        │          │            │                      │                                 │                               │                                                  ├─{python}(25060)
        │          │            │                      │                                 │                               │                                                  ├─{python}(24020)
        │          │            │                      │                                 │                               │                                                  └─{python}(24021)
        │          │            │                      │                                 │                               └─sleep(2746)
        │          │            │                      │                                 ├─{runc}(6676)
        │          │            │                      │                                 ├─{runc}(6678)
        │          │            │                      │                                 ├─{runc}(6679)
        │          │            │                      │                                 ├─{runc}(6687)
        │          │            │                      │                                 ├─{runc}(6692)
        │          │            │                      │                                 ├─{runc}(6706)
        │          │            │                      │                                 ├─{runc}(6715)
        │          │            │                      │                                 └─{runc}(6721)
        │          │            │                      └─{wrapper_2019_03}(370)
        │          │            └─{boinc}(359)
        │          ├─cvmfs2(938)
        │          ├─cvmfs2(941)
        │          ├─cvmfs2(945)─┬─{cvmfs2}(950)
        │          │             ├─{cvmfs2}(951)
        │          │             ├─{cvmfs2}(952)
        │          │             ├─{cvmfs2}(953)
        │          │             ├─{cvmfs2}(954)
        │          │             ├─{cvmfs2}(955)
        │          │             ├─{cvmfs2}(956)
        │          │             ├─{cvmfs2}(957)
        │          │             ├─{cvmfs2}(958)
        │          │             ├─{cvmfs2}(959)
        │          │             ├─{cvmfs2}(960)
        │          │             ├─{cvmfs2}(961)
        │          │             ├─{cvmfs2}(962)
        │          │             ├─{cvmfs2}(963)
        │          │             ├─{cvmfs2}(964)
        │          │             ├─{cvmfs2}(965)
        │          │             ├─{cvmfs2}(966)
        │          │             ├─{cvmfs2}(967)
        │          │             ├─{cvmfs2}(968)
        │          │             └─{cvmfs2}(971)
        │          ├─cvmfs2(949)
        │          ├─cvmfs2(1120)─┬─{cvmfs2}(1125)
        │          │              ├─{cvmfs2}(1126)
        │          │              ├─{cvmfs2}(1127)
        │          │              ├─{cvmfs2}(1128)
        │          │              ├─{cvmfs2}(1129)
        │          │              ├─{cvmfs2}(1130)
        │          │              ├─{cvmfs2}(1131)
        │          │              ├─{cvmfs2}(1132)
        │          │              ├─{cvmfs2}(1133)
        │          │              ├─{cvmfs2}(1134)
        │          │              ├─{cvmfs2}(10200)
        │          │              ├─{cvmfs2}(10248)
        │          │              ├─{cvmfs2}(10311)
        │          │              ├─{cvmfs2}(10313)
        │          │              ├─{cvmfs2}(10315)
        │          │              ├─{cvmfs2}(10407)
        │          │              ├─{cvmfs2}(10408)
        │          │              ├─{cvmfs2}(10412)
        │          │              ├─{cvmfs2}(10422)
        │          │              └─{cvmfs2}(10427)
        │          ├─cvmfs2(1124)
        │          ├─cvmfs2(2477)─┬─{cvmfs2}(2482)
        │          │              ├─{cvmfs2}(2483)
        │          │              ├─{cvmfs2}(2484)
        │          │              ├─{cvmfs2}(2485)
        │          │              ├─{cvmfs2}(2486)
        │          │              ├─{cvmfs2}(2487)
        │          │              ├─{cvmfs2}(2488)
        │          │              ├─{cvmfs2}(2489)
        │          │              ├─{cvmfs2}(2490)
        │          │              ├─{cvmfs2}(2491)
        │          │              ├─{cvmfs2}(2492)
        │          │              ├─{cvmfs2}(2494)
        │          │              ├─{cvmfs2}(2495)
        │          │              ├─{cvmfs2}(2496)
        │          │              ├─{cvmfs2}(2497)
        │          │              ├─{cvmfs2}(2498)
        │          │              ├─{cvmfs2}(2499)
        │          │              ├─{cvmfs2}(2500)
        │          │              ├─{cvmfs2}(2501)
        │          │              └─{cvmfs2}(6094)
        │          ├─cvmfs2(2481)
        │          ├─cvmfs2(3828)─┬─{cvmfs2}(3833)
        │          │              ├─{cvmfs2}(3834)
        │          │              ├─{cvmfs2}(3835)
        │          │              ├─{cvmfs2}(3836)
        │          │              ├─{cvmfs2}(3837)
        │          │              ├─{cvmfs2}(3838)
        │          │              ├─{cvmfs2}(3839)
        │          │              ├─{cvmfs2}(3840)
        │          │              ├─{cvmfs2}(3841)
        │          │              ├─{cvmfs2}(3842)
        │          │              ├─{cvmfs2}(3843)
        │          │              ├─{cvmfs2}(3844)
        │          │              ├─{cvmfs2}(3845)
        │          │              ├─{cvmfs2}(3846)
        │          │              ├─{cvmfs2}(3847)
        │          │              ├─{cvmfs2}(3848)
        │          │              ├─{cvmfs2}(3849)
        │          │              ├─{cvmfs2}(3850)
        │          │              ├─{cvmfs2}(3851)
        │          │              └─{cvmfs2}(6841)
        │          ├─cvmfs2(3832)
        │          └─squid(77)───squid(81)─┬─pinger(110)
        │                                  ├─{squid}(10183)
        │                                  ├─{squid}(10184)
        │                                  ├─{squid}(10185)
        │                                  ├─{squid}(10186)
        │                                  ├─{squid}(10187)
        │                                  ├─{squid}(10188)
        │                                  ├─{squid}(10189)
        │                                  ├─{squid}(10190)
        │                                  ├─{squid}(10191)
        │                                  ├─{squid}(10192)
        │                                  ├─{squid}(10193)
        │                                  ├─{squid}(10194)
        │                                  ├─{squid}(10195)
        │                                  ├─{squid}(10196)
        │                                  ├─{squid}(10197)
        │                                  └─{squid}(10198)
        ├─init(30523)───init(30524)───bash(30525)───pstree(2755)
        └─{init}(7)
10) Message boards : Theory Application : Extreme Overload caused by a Theory Task (Message 47482)
Posted 4 Nov 2022 by AndreyOR
Post:
I got one of these MadGraph tasks today. I was perplexed as to why BOINC was running just 1 native Theory task (as opposed to 8 concurrently) for no reason I could find, never seen that before.

Indeed the madgraph code default is to use all CPU cores.
This is corrected and the limit now is set to 2 cores max.

It seem like it has not been corrected.

Any way to tell how long it'll take to finish? Right now it's at 99.920% and 11h 42m elapsed time.
11) Message boards : ATLAS application : ATLAS badges (Message 47052)
Posted 30 Jul 2022 by AndreyOR
Post:
One thing to note about ATLAS badges is that very few people have achieved the 5 million points to get the highest badge, as of now only 119 users, top 1% at least. While it's true that there's a big variance among the 119, having 5 million being the minimum for the highest badge doesn't seem unreasonable. Even 1 million points (second highest badge) puts you in at least top 5%, only 557 users have at least 1 million.

What I'd like to see is that the badge system be redone and the total contribution to LHC@home, credit for all sub-projects combined, be counted. If I'm not mistaking, ATLAS badging is a holdover from the days when ATLAS@home was its own thing and I don't think it makes sense anymore. David, or other admin, would that be too much of an undertaking?
12) Message boards : ATLAS application : app_config.xml parameters question (Message 46805)
Posted 19 May 2022 by AndreyOR
Post:
It seems like you're using the VBox version of ATLAS not native. According to a post above, that argument is valid in the VBox version not native though.
13) Message boards : ATLAS application : app_config.xml parameters question (Message 46799)
Posted 19 May 2022 by AndreyOR
Post:
I checked that file and it seems like the only argument that's available is --nthreads. I thought that some time ago when I was trying to figure out how to set up native ATLAS I saw the usage of --memory_size parameter in the forums. I added it to the app_config file (can't remember what made me think I needed it) but it seems like it's invalid and is just ignored. I'll have to delete it then. Was it ever used in the past?
14) Message boards : ATLAS application : app_config.xml parameters question (Message 46798)
Posted 19 May 2022 by AndreyOR
Post:
captainjack, yes, I'd expect your app_config to work. There seem to be some redundant/unnecessary entries though. Since you're only trying to modify the native ATLAS version you'd only need the app_version portion of the app_config. In addition, <cmdline>--nthreads 5</cmdline> is redundant since you're already specifying that you want to use 5 CPUs via avg_ncpus. I also believe that the app section is ignored since it's incomplete. Try the following. It's cleaner, shorter and so if you want to change things you're less likely to make an accidental mistake.
<app_config>
    <app_version>
      <app_name>ATLAS</app_name>
      <plan_class>native_mt</plan_class>
      <avg_ncpus>5</avg_ncpus>
    </app_version>
</app_config>
15) Message boards : ATLAS application : app_config.xml parameters question (Message 46793)
Posted 18 May 2022 by AndreyOR
Post:
Thank you for explaining some more. It seems my suspicion was right that you wouldn't use avg_ncpus and --nthreads command line parameter for the same app. It'll either be redundant or detrimental (if different values are used). avg_ncpus would be the way to control thread usage of multithread apps. I was curious and was able to find a list of command line paramaters for MilkyWay N-Body Simulation but couldn't find them for LHC ATLAS (native). Could you provide a link? Thank you.
16) Message boards : ATLAS application : app_config.xml parameters question (Message 46789)
Posted 18 May 2022 by AndreyOR
Post:
I'm familiar with that page and have read it before but it doesn't clarify things much. I understand the difference between cmdline and avg_ncpus in general. I'm specifically wondering about <cmdline>--nthreads x</cmdline> not just cmdline in general. --nthreads in cmdline and avg_ncpus seems to be specifying the same thing and thus seem redundant. However, I've seen people post their app_config files with both entries and I couldn't see why. Would you ever use both in the same app_config and if so under what circumstances? Also, how does one know what cmdline parameters a given program understands?
17) Message boards : ATLAS application : app_config.xml parameters question (Message 46784)
Posted 18 May 2022 by AndreyOR
Post:
What is the difference between the following 2 parameters in the app_version section of app_config.xml file, especially as it pertains to multithread apps (LHC ATLAS, MilkyWay N-Body Simulation)?
<app_version>
   <avg_ncpus>x</avg_ncpus>
   <cmdline>--nthreads x</cmdline>
</app_version>
18) Message boards : Number crunching : Tasks stuck at 99.99% with run time of 1 day+ (Message 46554)
Posted 29 Mar 2022 by AndreyOR
Post:
Disabling macOS time sync early in the process helped as I stopped getting those kinds of messages and the last batch of tasks just completed successfully. As I was looking into your suggestion I noticed that when setting up a VM in VBox there's an option under System/Motherboard to specify "Hardware Clock in UTC Time". It's checked by default so I unchecked it and turned time sync in macOS back on (which is the default anyway). I'm curious to see if this simpler solution will also work as changing BIOS time and updating Windows registry and making sure that VMs are set up right is a bit more involved.
19) Message boards : Number crunching : Tasks stuck at 99.99% with run time of 1 day+ (Message 46523)
Posted 23 Mar 2022 by AndreyOR
Post:
Thanks for the suggestions. I found a VBox command in the manual to make VM sync time with host frequently but that didn't seem to make a difference. So I disabled time checking/syncing on macOS to see if that'll help. If it doesn't I'll try your suggestions. Is following your suggestions going to make my PC run on UTC time instead of local time?
20) Message boards : Number crunching : Tasks stuck at 99.99% with run time of 1 day+ (Message 46516)
Posted 22 Mar 2022 by AndreyOR
Post:
greg_be,
It seems like you've had time discrepancy issues with Rosetta on VM. I've recently been dealing with this issue on a different project. I've recently started running MacOS Mojave on VBox to process 32-bit tasks for climateprediction.net. I've been getting a message in BOINC event log that reads (numbers in parenthesis vary):
New system time (1647911207) < old system time (1648063920); clearing timeouts

Following this, task progress bars freeze but the time counting continues. I can get things going again by doing suspend/resume on each task but so far tasks error out at the very end. Which sucks since these are very long running tasks, take days to weeks to run. I've never seen this kind of issues before but I also don't use VBox much. I rarely run apps that are VBox only since I use WSL2 to run Linux apps which uses Hyper-V and those don't really work well together.


Next 20


©2024 CERN