Message boards : Theory Application : Unable to run native Theory in WSL2 Ubuntu 20.04
Message board moderation

To post messages, you must log in.

AuthorMessage
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,537,240
RAC: 6
Message 45979 - Posted: 3 Jan 2022, 9:14:41 UTC

I was hoping to use WSL2 Ubuntu 20.04 setup to run native Theory tasks but they error out. Here's an example task with log: https://lhcathome.cern.ch/lhcathome/result.php?resultid=338017823
Plenty of resources on the system and cvmfs_config probe checks out ok. Any ideas on why this may be happening? By the way, I'm able to run a number of other linux projects including single core Atlas tasks (but not multicore) on this setup.
ID: 45979 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,018,854
RAC: 136,230
Message 45980 - Posted: 3 Jan 2022, 9:52:01 UTC - in response to Message 45979.  

To check your cgroup version run:
grep cgroup /proc/filesystems

To check your runc version run:
command -v runc
$(command -v runc) --version


Post the output of all commands here.


I'm able to run a number of other linux projects including single core Atlas tasks (but not multicore) on this setup.

Can't find a recent ATLAS log.
Could you post a link?
ID: 45980 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,189,467
RAC: 104,370
Message 45982 - Posted: 3 Jan 2022, 16:07:42 UTC - in response to Message 45979.  

In this Theory-folder is a instruction for native from Laurence:
Native Theory Application Setup (Linux only)
ID: 45982 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,537,240
RAC: 6
Message 45989 - Posted: 4 Jan 2022, 9:02:22 UTC - in response to Message 45980.  

Output for: grep cgroup /proc/filesystems
nodev cgroup
nodev cgroup2

There's no output for: command -v runc

Output for: $(command -v runc) --version
--version: command not found

I don't know that much about Linux so don't know what all that means exactly since I get same outputs on Hyper-V Ubuntu and have no problem running Theory and multicore ATLAS there.

I haven't ran ATLAS in a while so everything cleared out but I just ran a 2-core task and even though it says Completed and Validated, the log says No HITS result produced. https://lhcathome.cern.ch/lhcathome/result.php?resultid=338077983 I started a single core task but it'll take a few hours to finish. https://lhcathome.cern.ch/lhcathome/result.php?resultid=338078147

In regard to what maeax said... When I try to enable user namespaces from those instructions I get the following.
Commands: sudo sed -i '$ a\kernel.unprivileged_userns_clone = 1' /etc/sysctl.conf
sudo sysctl -p

Output: sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory

One thing I learned from some research is that WSL2 Ubuntu (and maybe other distributions) are init.d and not systemd. Could that be part of the problem? If so, is there still a way to get Theory and multicore ATLAS to work? I like WSL2 for Linux projects because of its low use of resources as opposed to using Hyper-V which I've done before for Theory and multicore ATLAS.
ID: 45989 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,018,854
RAC: 136,230
Message 45990 - Posted: 4 Jan 2022, 10:04:44 UTC - in response to Message 45989.  

Major reason:
Output for: grep cgroup /proc/filesystems
nodev cgroup
nodev cgroup2

Your Linux uses cgroup v2.
Theory native (more precise: it's wrapper script "cranky") is developed for cgroup v1 which is completely different.


Minor reason:
There's no output for: command -v runc

=> no local runc installation
Theory native calls runc from a CVMFS repository (/cvmfs/grid.cern.ch/vc/containers/runc)
That's version 1.0.0 which may or may not run on recent Linux versions.


In short:
Your Linux version may be too recent to run Theory native.
You may use Theory vbox until an upgrade is available.



Regarding ATLAS

I'm not sure if the the error causing your 1st task to fail was a temporary glitch.
The logs are just snippets but point out there may have been an issue deeper in the process.
It's also possible that WSL2 can deal with singlecore ATLAS but not with multicore.
This needs to be tested.
ID: 45990 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,537,240
RAC: 6
Message 45994 - Posted: 4 Jan 2022, 12:03:46 UTC - in response to Message 45990.  

So I'm a bit confused then. I get exactly the same outputs for those commands in Hyper-V Ubuntu 20.04.3 (same version as WSL2) set up on the same PC but have no problem running native Theory and multicore ATLAS. Here's an example Theory task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=337970091. I'm going to try running MilkyWay multicore app on WSL2 and see if it works, that's the only other multicore app that I'm aware of.

What do you think about the things I mentioned in the last 3 paragraphs: being unable to change user namespaces and that WSL2 Ubuntu is initi.d and not systemd like regular Ubuntu installations are?
ID: 45994 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,018,854
RAC: 136,230
Message 45995 - Posted: 4 Jan 2022, 14:00:50 UTC - in response to Message 45994.  

Well, it's not exactly the same.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10700856
Ubuntu 20.04.3 LTS [5.10.60.1-microsoft-standard-WSL2|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)]


https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10699981
Ubuntu 20.04.3 LTS [5.11.0-43-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)]


Beside the different kernel versions there might be different configurations used to make it run under WSL2.

Your valid tasks show that cgroup's "freezer" has been used to suspend/resume the tasks.
This makes me suspect this Ubuntu is configured to use cgroup v1.
Otherwise cranky would have failed.




Regarding "kernel.unprivileged_userns_clone"
This was a special switch for Debian Stretch years ago.
I'm sure it's not used any more and has been removed.


I learned from some research...

Why not ask your Linux:
ps -Af |grep 'systemd/systemd '
It's using systemd if there's a line starting with:
root          1      0 ...




I'm going to try running MilkyWay multicore app on WSL2 ...
This will not tell you whether ATLAS or Theory will work, or vice versa.
ID: 45995 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,537,240
RAC: 6
Message 46001 - Posted: 5 Jan 2022, 10:20:52 UTC - in response to Message 45995.  

Yes, I read that WSL2 has a custom kernel so even if Ubuntu versions are identical the kernels are not. I ran the command you suggested and yes, WSL2 is init.d (Hyper-V setup is systemd).

It seems you're right about the user namespaces thing, that it's not used anymore. I deleted the entry on my Hyper-V setup, restarted it and tried a couple of Theory tasks which finished with no problems. This at least eliminates that as a possible reason for why Theory won't work on WSL2. It seems like the instructions on that sticky post need an update.

With the MillkyWay, I wanted to at least see if WSL2 Ubuntu could run multicore projects of any kind, and it can, the tasks finished without problems.

I wonder if there's anything else to try. Maybe installing runc as opposed to using the one provided? I remember reading your suggestion when I first was trying to get everything set up that installing singularity instead of using the one provided is more reliable. That suggestion worked for me both in Hyper-V and WSL2.
ID: 46001 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 8 Dec 19
Posts: 37
Credit: 7,537,240
RAC: 6
Message 46031 - Posted: 10 Jan 2022, 8:16:57 UTC

With computezrmle's troubleshooting help I was able to find a solution. Basically one needs to add a line to WSL2 configuration file (.wslconfig) to enable vsyscall emulation which solves the memory access violation (exit code 139) problem. See https://docs.microsoft.com/en-us/windows/wsl/wsl-config for details on WSL2 configurations. A simple .wslconfig file that works for native Theory might look like this:

[wsl2]
memory=16GB
processors=8
kernelCommandLine = vsyscall=emulate
ID: 46031 · Report as offensive     Reply Quote

Message boards : Theory Application : Unable to run native Theory in WSL2 Ubuntu 20.04


©2024 CERN