[systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

Topi Miettinen toiwoton at gmail.com
Mon Oct 26 16:31:47 UTC 2020


On 26.10.2020 16.52, Catalin Marinas wrote:
> On Sat, Oct 24, 2020 at 02:01:30PM +0300, Topi Miettinen wrote:
>> On 23.10.2020 12.02, Catalin Marinas wrote:
>>> On Thu, Oct 22, 2020 at 01:02:18PM -0700, Kees Cook wrote:
>>>> Regardless, it makes sense to me to have the kernel load the executable
>>>> itself with BTI enabled by default. I prefer gaining Catalin's suggested
>>>> patch[2]. :)
>>> [...]
>>>> [2] https://lore.kernel.org/linux-arm-kernel/20201022093104.GB1229@gaia/
>>>
>>> I think I first heard the idea at Mark R ;).
>>>
>>> It still needs glibc changes to avoid the mprotect(), or at least ignore
>>> the error. Since this is an ABI change and we don't know which kernels
>>> would have it backported, maybe better to still issue the mprotect() but
>>> ignore the failure.
>>
>> What about kernel adding an auxiliary vector as a flag to indicate that BTI
>> is supported and recommended by the kernel? Then dynamic loader could use
>> that to detect that a) the main executable is BTI protected and there's no
>> need to mprotect() it and b) PROT_BTI flag should be added to all PROT_EXEC
>> pages.
> 
> We could add a bit to AT_FLAGS, it's always been 0 for Linux.

Great!

>> In absence of the vector, the dynamic loader might choose to skip doing
>> PROT_BTI at all (since the main executable isn't protected anyway either, or
>> maybe even the kernel is up-to-date but it knows that it's not recommended
>> for some reason, or maybe the kernel is so ancient that it doesn't know
>> about BTI). Optionally it could still read the flag from ELF later (for
>> compatibility with old kernels) and then do the mprotect() dance, which may
>> trip seccomp filters, possibly fatally.
> 
> I think the safest is for the dynamic loader to issue an mprotect() and
> ignore the EPERM error. Not all user deployments have this seccomp
> filter, so they can still benefit, and user can't tell whether the
> kernel change has been backported.

But the seccomp filter can be set to kill the process, so that's 
definitely not the safest way. I think safest is that when the AT_FLAGS 
bit is seen, ld.so doesn't do any mprotect() calls but instead when 
mapping the segments, mmap() flags are adjusted to include PROT_BTI, so 
mprotect() calls are not necessary. If there's no seccomp filter, 
there's no disadvantage for avoiding the useless mprotect() calls.

I'd expect the backported kernel change to include both aux vector and 
also using PROT_BTI for the main executable. Then the logic would work 
with backported kernels as well.

If there's no aux vector, all bets are off. The kernel could be old and 
unpatched, even so old that PROT_BTI is not known. Perhaps also in the 
future there may be new technologies which have replaced BTI and the 
kernel could want a previous generation ld.so not to try to use BTI, so 
this could be also indicated with the lack of aux vector. The dynamic 
loader could still attempt to mprotect() the pages, but that could be 
fatal. Getting to the point where the error can be ignored means that 
there's no seccomp filter, at least none set to kill. Perhaps the pain 
is only temporary, new or patched kernels should eventually replace the 
old versions.

> Now, if the dynamic loader silently ignores the mprotect() failure on
> the main executable, is there much value in exposing a flag in the aux
> vectors? It saves a few (one?) mprotect() calls but I don't think it
> matters much. Anyway, I don't mind the flag.

Saving a few system calls is indeed not an issue, but not being able to 
use MDWX and PROT_BTI simultaneously was the original problem (service 
failures).

-Topi


More information about the systemd-devel mailing list