1) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46088)
Posted 18 Jan 2022 by Brummig
Post:
The host isn't ordinarily restarting, but as far as I could tell one of the ATLAS tasks I received that day had a massive memory leak that crippled WSL2.

The current problem is that tasks are failing with "Looping job killed by pilot".
2) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46081)
Posted 17 Jan 2022 by Brummig
Post:
WSL2 and Windows are on two different IP addresses. They're not even on the same subnet. Windows gets its IP address from the DHCP server, whilst WSL2 creates its own IP address.

I have two BOINC managers running, one under Windows and one under WSL2. Mostly they communicate with the appropriate client, but sometimes odd things happen. I think that is caused by the Windows BOINC Manager communicating via the "wrong" adapter. ipconfig on Windows reports two adapters, one for WSL2 and one for Windows, whilst ifconfig on WSL2 reports only the one adapter (so it can't see the Windows client).

BTW, if you're wondering why I run the two clients, it's because the WSL2 client can't run GPU tasks.
3) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46059)
Posted 14 Jan 2022 by Brummig
Post:
@AndreyOR: I've found the same. Yesterday I ended up in an even bigger mess. I used my Windows BOINC Manager to check on one of my Pies, but when I switched back to the Windows client I got Connection Error: Invalid client RPC password. That sometime happens. Normally it requires a reboot, but mindful that LHC@Home was running under WSL, I tried to sort it out without a reboot. The best I could achieve when selecting "localhost" was having the Windows BOINC Manager reporting the WSL client activity.
4) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46047)
Posted 12 Jan 2022 by Brummig
Post:
I had a look through those instructions for Singularity, AndreyOR, and unless I've missed something they are for people who want to build Singularity from the source code. What I did (I think) was install Singularity from a pre-compiled package, so all being well it should just work. It may prove necessary to install an earlier version, but the only way to find out is to try it.

In creating the instructions at the top of this thread, I'm trying to create the simplest possible way of being able to run Theory natively. Whilst rolling up your sleeves and building from the source code is sometimes the only way to get something that works, too often in the Linux world I have come across guides that would have you struggling to compile dozens of libraries and packages, when all you actually need to do is install with a single command a package that was built by someone who understood what they were doing.
5) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46042)
Posted 11 Jan 2022 by Brummig
Post:
Thank you, AndreyOR, that's helpful. I would be interested to see what you have done to get it working. If it's helpful, feel free to copy, paste, and edit my post to use as a template for the parts that are correct (it was very tedious and time-consuming getting it formatted clearly). I don't really care if I'm limited to single-threaded tasks; I just want a simple way to contribute something. I have no desire to spend days of time I don't have becoming an expert in somebody else's software.

I have searched the original guide for "default.local" and "cvmfs" to see if any of the many posts in the thread indicate that the instructions for default.local should not be followed, and they do not. How is anyone supposed to know which parts of that thread are currently applicable and which are not? I hope nobody is going to suggest searching the forum. How can you search for something if you don't know what you're looking for? Even if you do stumble across something, how will you know if it is currently valid? Or are contributors just expected to first read every post on the forum?

It would be really helpful if the software, like for other projects, returned an error code if something went awry, rather than granting credit and placing an obscure message in a log file that most crunchers will not look at (let alone understand), given that they have been granted credit. Even if I did take a read through the log, given that credit was granted I would have assumed that having no HITS result (whatever that means) is a good thing.
6) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46039)
Posted 11 Jan 2022 by Brummig
Post:
I'm not insisting on anything. I was simply following a sticky thread, on the assumption that it was still appropriate. If it's out of date, then it needs to be unstuck and new instructions posted. It's not reasonable to expect people to search many threads and posts (including much discussion) to find all the information needed, not knowing whether those posts are outdated.

Since editing my post appears to be impossible, I'm happy to create a new thread (and I'm happy for others to replace it with updated instructions when that becomes necessary). Is there a download URL for the latest default.local, or will be people forever be condemned to searching the forum to ensure they have the correct entries in it? I see no point in linking to https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5594, since that could become outdated and replaced with another thread, completely unknown to the person reading it.
7) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46036)
Posted 11 Jan 2022 by Brummig
Post:
As I said in the my original post, I took my information, including the settings for default.conf, from https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840. So please address your questions to the author of that post, and let me know what is the conclusion.
8) Message boards : ATLAS application : Guide to Getting Quickly Started Running Native ATLAS (Ubuntu 20.04 on WSL2) (Message 46032)
Posted 10 Jan 2022 by Brummig
Post:
I run WSL2, which means I can't run ATLAS tasks in Virtual Box. However, I have been running other BOINC tasks in WSL2 for some time now. So I looked at the sticky thread in this part of the forum for running ATLAS natively (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840), and my heart sank when I saw the very long list of instructions. However, these instructions are for someone who wants to build everything from scratch, including BOINC, so I looked to see if there was a more straightforward way, and there is.

The instructions below are based on information gathered from the instructions for ATLAS natively and the instructions for running Theory natively (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4971). I've assumed you have BOINC installed and running, and that you are familiar with entering commands into a bash shell. I have written up a guide to getting up and running with BOINC on WSL2 on the Universe@Home forum, but at the time of writing that website is down for extensive repairs. These instructions should also work running Ubuntu and other Debian distributions natively, with minor adjustments (no need to keep starting CVMFS!).

    Install and setup the CERN Virtual Machine File System (CVMFS):
      Enter the following commands:
      wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
      sudo dpkg -i cvmfs-release-latest_all.deb
      rm -f cvmfs-release-latest_all.deb
      sudo apt-get update
      sudo apt-get install cvmfs
      cvmfs_config setup
      

      Edit the CVMFS configuration file...
      sudo nano /etc/cvmfs/default.local
      

      ...and add the following lines
      CVMFS_REPOSITORIES=atlas.cern.ch,atlas-condb.cern.ch,grid.cern.ch
      CVMFS_CACHE_BASE=/scratch/cvmfs
      CVMFS_QUOTA_LIMIT=4096
      CVMFS_HTTP_PROXY=DIRECT
      

      On WSL2 you will need to start CVMFS, now and every time the WSL2 virtual machine starts, using:
      sudo cvmfs_config wsl2_start
      

      You can check all is OK with:
      cvmfs_config probe
      


    Install Singularity:


      Enter the following into the command line:
      wget https://github.com/sylabs/singularity/releases/download/v3.9.1/singularity-ce_3.9.1+6-g38b50cbc5-focal_amd64.deb
      sudo dpkg -i singularity-ce_3.9.1+6-g38b50cbc5-focal_amd64.deb
      rm singularity-ce_3.9.1+6-g38b50cbc5-focal_amd64.deb
      

      Note that the filename for the Singularity package will change over time. You can find its current name by looking at https://github.com/sylabs/singularity/releases, from where you can also, of course, download it using your web browser if you prefer.


    Activate:


    Enjoy!

      You should find tasks crunch to successful completion, and you should find you can shut down both the BOINC client and WSL without blitzing your ATLAS tasks.


9) Message boards : Number crunching : No Tasks Available (Message 45098)
Posted 2 Jul 2021 by Brummig
Post:
My client has finally stopped telling me no tasks are available (as it has been since May) and is now crunching on SixTrack.
10) Message boards : Number crunching : No Tasks Available (Message 44939)
Posted 12 May 2021 by Brummig
Post:
Meh. OK, thanks, at least I know now. I'll keep prodding, and see if I can bring it back to life.
11) Message boards : Number crunching : No Tasks Available (Message 44937)
Posted 12 May 2021 by Brummig
Post:
By "it isn't dropping many places in the credit ranking" (emphasis added) I mean the credit total compared with others, and across all LHC tasks, not just SixTrack. In the past week, only three people on the entire planet have overtaken someone who is getting no work at all. To me that suggests not many people are successfully running LHC tasks.

I can't see any reason why my settings should suddenly start causing Boinc to report that no tasks are available, or are stopping Boinc from running LHC tasks..
12) Message boards : Number crunching : No Tasks Available (Message 44935)
Posted 12 May 2021 by Brummig
Post:
Until recently I was getting lots of Sixtrack tasks, and before that I was getting lots of Theory tasks. Now all I get is:
    12/05/2021 12:12:57 | LHC@home | No tasks sent
    12/05/2021 12:12:57 | LHC@home | No tasks are available for SixTrack
    12/05/2021 12:12:57 | LHC@home | No tasks are available for CMS Simulation
    12/05/2021 12:12:57 | LHC@home | No tasks are available for Theory Simulation
    12/05/2021 12:12:57 | LHC@home | No tasks are available for ATLAS Simulation


Yet the server reports it has lots of tasks are available. However, for a host that is getting no new tasks, it isn't dropping many places in the credit ranking. Any suggestions?

13) Message boards : Theory Application : cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed (Message 43570)
Posted 6 Nov 2020 by Brummig
Post:
The same has just happened to me. 22 hours of doing nothing.
14) Message boards : Theory Application : New Version v300.05 (Message 41412)
Posted 28 Jan 2020 by Brummig
Post:
I've now confirmed Theory doesn't survive an overnight hibernation either, even with Leave non-GPU tasks in memory while suspended not selected (I've never had this selected). So that explains the tasks that never complete, but then get completed in a fraction of the time by another host. A task that doesn't complete by the time the host is put into hibernation will restart the following morning, and if it can complete by the end of the working day it should do so. But if it can't complete by the end of the working day, it will just run and run, never completing. I've not yet tried suspending VM tasks before hibernating, but I've never had to do that with Theory tasks in the past.
15) Message boards : Theory Application : New Version v300.05 (Message 41370)
Posted 27 Jan 2020 by Brummig
Post:
Following resume from hibernation over the weekend, this long-runner briefly continued on to something over 57,000 events, and then it reset itself and started again from zero:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=259725584

It's possible Theory tasks don't survive hibernation over a weekend. However, I also caught it last week throwing errors/warnings:

PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.10978) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.34534) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 6.61948) for g to b
The decay Xi(1690)- -> Sigma- KbarO 2.10871 500 is too inefficient for the particle 816 Xi(1690)
- 13312 [601]

0.935 2.078 25 .560 25.718 5 «bs 9
vetoing the decay
PDFVeto warning: Ratio > GtobbbarSudakovu:PDFmax (by a factor of 1.05218) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 5.63208) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakovu:PDFmax (by a factor of 3.54622) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.06896) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.98784) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakovu:PDFmax (by a factor of 1.04204) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.83883) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.85025) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 12.6764) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 3.83015) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.55048) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 2.53167) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.04879) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 2.41224) for g to b
PDFVeto warning: Ratio > GtobbbarSudakovu:PDFmax (by a factor of 1.92092) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.52194) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.07241) for g to b
PDFVeto warning: Ratio > GtobbbarSudakovu:PDFmax (by a factor of 4.16827) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.85123) for g to b
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.09399) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 2.30685) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.31057) for g to bbar
PDFVeto warning: Ratio > GtobbbarSudakov:PDFmax (by a factor of 1.38341) for g to b
a An event exception of type ThePEG: :Exception occurred while generating event number 28880:
Remnant extraction failed in ShowerHandler::cascadeQ) from primary interaction
The event will be discarded.
28900 events processed
29000 events processed
dumping histograms...
16) Message boards : Theory Application : New Version v300.05 (Message 41312)
Posted 20 Jan 2020 by Brummig
Post:
@Crystal Pellet:
Yes, I monitored one for some time. There was no evidence of any progress, and after switching back and forth between displaying different information, it settled on saying that it had processed zero of zero events. I aborted it, and it went to another host that completed it in a fraction of the time my host had been chewing on it. Curiously, whilst that task ran frantically doing nothing, two tasks, after being aborted, reported the run time and CPU time as zero. For example, task 259168280 has a start timestamp of 2020-01-13 15:31:52. I aborted it at 16 Jan 2020, 8:32:03 UTC because it jumped to an extreme estimated completion time, but apparently it did absolutely nothing during the time it was supposedly running (a couple of hours). Task 259230427 was sent 14 Jan 2020, 12:55:32 UTC, and aborted 15 Jan 2020, 8:59:17 UTC. That second task has just this in the stderr output:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
aborted by user</message>
]]>
17) Message boards : Theory Application : New Version v300.05 (Message 41293)
Posted 17 Jan 2020 by Brummig
Post:
Well you can call it what you like, but the fact remains that those tasks that say they will complete in a reasonable time (a few hours), and stay that way, will complete within a few hours of computer run time. Those that suddenly jump from displaying a few hours to four days will run and run. Some people have let them run and run, only to find they fail, but I just abort them because I don't want to waste time and electricity (at my expense) on them. Typically when resent the receiving host completes them in a fraction of the time my host spent on them, and I'm not the only person seeing that behaviour. Since this wasn't previously a problem, it strongly suggests a bug has been introduced in the latest Theory tasks.
18) Message boards : Theory Application : New Version v300.05 (Message 41291)
Posted 17 Jan 2020 by Brummig
Post:
No, it's not enough time. One Theory task I have at present says it will take over four days of CPU time. The host it is running on is powered up during work hours, ie around eight hours a day, five days a week (using spare CPU cycles is the intention behind BOINC). So four days of CPU time will take 12 working days plus two weekends, ie 16 days. The deadline is in ten days. Ten is quite a bit less than 16, and I haven't even taken into account doing CPU intensive tasks as part of my work. I don't mind if a task genuinely takes four days of CPU time (like CPDN tasks typically do), but the deadline needs to be suitably distant in the future.

I did once leave one of these tasks to run beyond the deadline, but even once past the deadline it was still a couple of days away from completing, so I aborted it. Others on this forum have let these long-runner tasks run and run, only to have them fail. That suggests the solution is to fix the bug, rather than to extend the deadline.
19) Message boards : Theory Application : New Version v300.05 (Message 41289)
Posted 17 Jan 2020 by Brummig
Post:
The problem is they typically don't finish by the deadline, but when passed to another host that host may complete the task quickly, making leaving them to run a waste of host time and electricity. The tasks that say they will complete in a reasonable time do complete in a reasonable time.
20) Message boards : Theory Application : New Version v300.05 (Message 41287)
Posted 17 Jan 2020 by Brummig
Post:
I've just had two of these, and (once again) both tasks reported they required significantly more CPU time than was available before the deadline. I've aborted them. Is this problem going to be addressed (please)?


Next 20


©2024 CERN