[systemd-devel] How to disable seccomp in systemd-nspawn?

Steve Dodd steved424 at gmail.com
Sun Aug 16 15:05:07 UTC 2020


On Sun, 16 Aug 2020 at 15:47, Lennart Poettering <lennart at poettering.net>
wrote:

I think it would be wise to use do fallback logic for EPERM too. It's
> the error that nspawn uses since day #1 basically. I am a bit puzzled
> noone noticed this before, afaik glibc test cases at least on Fedora
> (where most glibc upstream devs work on) run in nspawn, so how did
> noone notice?
>

That's interesting .. it's possible things don't work quite the way I think
they do, but I will try to find previous examples - I remember borgbackup
was affected on armhf fairly recently, for example.

I suspect trying to convince glibc maintainers to check for EPERM could
turn into a holy war quite quickly :)


> > A rule of thumb might be to return ENOSYS for anything libseccomp doesn't
> > know about - is it possible to look things up that way around?
>
> libseccomp doesn't allow us to install filters for syscalls it doesn't
> know anyway iirc...
>
> Not sure I follow though? Why would that help?
>

Well, my logic was if seccomp didn't know about a syscall when it was built
then that syscall is "new", and userland can probably live without it. If
we're going to block it anyway (because seccomp doesn't know about it, it
won't end up in the whitelist, even if systemd/nspawn is more up-to-date),
we might as well return ENOSYS and let userland try a fallback (e.g. openat
instead of openat2.) We can still return EPERM for well-known-but-blocked
syscalls which hopefully indicates to sufficiently caffeinated users that
there's a security filter in place :)


> > Another useful thing might be to allow whitelisting by syscall number -
> > again don't know if seccomp allows this. Would allow easier work arounds
> in
> > cases like this without having to go off and backport libseccomp...
>
> syscall numbers are highly arch dep, we currently don't support that
> because you cannot reasonably express this in unit files, as they'd
> become very much arch dependent then.
>
> That said, I'd be happy to review/merge a patch that adds a syntax
> where you could spell out SystemCallFilter=x86-64:345 for example,
> i.e. specify arch plus syscall nr. But it's still ugly, since it would
> do result in different filers on different archs.
>

Yeah, I'm not suggesting anyone should deploy that in a published unit
file. But for individual admins/users to "bodge" a system in an override
file it might be handy. It's fractionally less messy to my mind than
manually backporting system libraries!

> Third thing on my wishlist might be a log entry for denied syscalls
> > somewhere ..
>
> Hmm, this would make a ton of sense. We currently have a "log" seccomp
> action, but it will just log and allow anyway. we'd need another
> action that would log and refuse. Please file an RFE, or even better
> prep a PR for this!
>

Looking at the kernel seccomp doc, I'm not actually sure it's possible,
from code at least:

https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html

But there is  /proc/sys/kernel/seccomp/actions_logged which might do the
trick!

S.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20200816/fdb3b516/attachment-0001.htm>


More information about the systemd-devel mailing list