[systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

Jeremy Linton jeremy.linton at arm.com
Mon Oct 26 22:39:42 UTC 2020


Hi,

On 10/26/20 12:52 PM, Dave Martin wrote:
> On Mon, Oct 26, 2020 at 04:57:55PM +0000, Szabolcs Nagy via Libc-alpha wrote:
>> The 10/26/2020 16:24, Dave Martin via Libc-alpha wrote:
>>> Unrolling this discussion a bit, this problem comes from a few sources:
>>>
>>> 1) systemd is trying to implement a policy that doesn't fit SECCOMP
>>> syscall filtering very well.
>>>
>>> 2) The program is trying to do something not expressible through the
>>> syscall interface: really the intent is to set PROT_BTI on the page,
>>> with no intent to set PROT_EXEC on any page that didn't already have it
>>> set.
>>>
>>>
>>> This limitation of mprotect() was known when I originally added PROT_BTI,
>>> but at that time we weren't aware of a clear use case that would fail.
>>>
>>>
>>> Would it now help to add something like:
>>>
>>> int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
>>> {
>>> 	int ret = -EINVAL;
>>> 	mmap_write_lock(current->mm);
>>> 	if (all vmas in [addr .. addr + len) have
>>> 			their mprotect flags set to old_flags) {
>>>
>>> 		ret = mprotect(addr, len, new_flags);
>>> 	}
>>> 	
>>> 	mmap_write_unlock(current->mm);
>>> 	return ret;
>>> }
>>
>> if more prot flags are introduced then the exact
>> match for old_flags may be restrictive and currently
>> there is no way to query these flags to figure out
>> how to toggle one prot flag in a future proof way,
>> so i don't think this solves the issue completely.
> 
> Ack -- I illustrated this model because it makes the seccomp filter's
> job easy, but it does have limitations.
> 
>> i think we might need a new api, given that aarch64
>> now has PROT_BTI and PROT_MTE while existing code
>> expects RWX only, but i don't know what api is best.
> 
> An alternative option would be a call that sets / clears chosen
> flags and leaves others unchanged.

I tend to favor a set/clear API, but that could also just be done by 
creating a new PROT_BTI_IF_X which enables BTI for areas already set to 
_EXEC. That goes right by the seccomp filters too, and actually is 
closer to what glibc wants to do anyway.


> 
> The trouble with that is that the MDWX policy then becomes hard to
> implement again.
> 
> 
> But policies might be best set via another route, such as a prctl,
> rather than being implemented completely in a seccomp filter.
> 
> Cheers
> ---Dave
> 



More information about the systemd-devel mailing list