[systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
Jeremy Linton
jeremy.linton at arm.com
Mon Oct 26 22:39:42 UTC 2020
Hi,
On 10/26/20 12:52 PM, Dave Martin wrote:
> On Mon, Oct 26, 2020 at 04:57:55PM +0000, Szabolcs Nagy via Libc-alpha wrote:
>> The 10/26/2020 16:24, Dave Martin via Libc-alpha wrote:
>>> Unrolling this discussion a bit, this problem comes from a few sources:
>>>
>>> 1) systemd is trying to implement a policy that doesn't fit SECCOMP
>>> syscall filtering very well.
>>>
>>> 2) The program is trying to do something not expressible through the
>>> syscall interface: really the intent is to set PROT_BTI on the page,
>>> with no intent to set PROT_EXEC on any page that didn't already have it
>>> set.
>>>
>>>
>>> This limitation of mprotect() was known when I originally added PROT_BTI,
>>> but at that time we weren't aware of a clear use case that would fail.
>>>
>>>
>>> Would it now help to add something like:
>>>
>>> int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
>>> {
>>> int ret = -EINVAL;
>>> mmap_write_lock(current->mm);
>>> if (all vmas in [addr .. addr + len) have
>>> their mprotect flags set to old_flags) {
>>>
>>> ret = mprotect(addr, len, new_flags);
>>> }
>>>
>>> mmap_write_unlock(current->mm);
>>> return ret;
>>> }
>>
>> if more prot flags are introduced then the exact
>> match for old_flags may be restrictive and currently
>> there is no way to query these flags to figure out
>> how to toggle one prot flag in a future proof way,
>> so i don't think this solves the issue completely.
>
> Ack -- I illustrated this model because it makes the seccomp filter's
> job easy, but it does have limitations.
>
>> i think we might need a new api, given that aarch64
>> now has PROT_BTI and PROT_MTE while existing code
>> expects RWX only, but i don't know what api is best.
>
> An alternative option would be a call that sets / clears chosen
> flags and leaves others unchanged.
I tend to favor a set/clear API, but that could also just be done by
creating a new PROT_BTI_IF_X which enables BTI for areas already set to
_EXEC. That goes right by the seccomp filters too, and actually is
closer to what glibc wants to do anyway.
>
> The trouble with that is that the MDWX policy then becomes hard to
> implement again.
>
>
> But policies might be best set via another route, such as a prctl,
> rather than being implemented completely in a seccomp filter.
>
> Cheers
> ---Dave
>
More information about the systemd-devel
mailing list