[PATCH] kernel: Expose SYS_kcmp by default

Michel Dänzer michel at daenzer.net
Mon Feb 8 13:49:51 UTC 2021


On 2021-02-08 2:34 p.m., Daniel Vetter wrote:
> On Mon, Feb 8, 2021 at 12:49 PM Michel Dänzer <michel at daenzer.net> wrote:
>>
>> On 2021-02-05 9:53 p.m., Daniel Vetter wrote:
>>> On Fri, Feb 5, 2021 at 7:37 PM Kees Cook <keescook at chromium.org> wrote:
>>>>
>>>> On Fri, Feb 05, 2021 at 04:37:52PM +0000, Chris Wilson wrote:
>>>>> Userspace has discovered the functionality offered by SYS_kcmp and has
>>>>> started to depend upon it. In particular, Mesa uses SYS_kcmp for
>>>>> os_same_file_description() in order to identify when two fd (e.g. device
>>>>> or dmabuf) point to the same struct file. Since they depend on it for
>>>>> core functionality, lift SYS_kcmp out of the non-default
>>>>> CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Kees Cook <keescook at chromium.org>
>>>>> Cc: Andy Lutomirski <luto at amacapital.net>
>>>>> Cc: Will Drewry <wad at chromium.org>
>>>>> Cc: Andrew Morton <akpm at linux-foundation.org>
>>>>> Cc: Dave Airlie <airlied at gmail.com>
>>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>>> Cc: Lucas Stach <l.stach at pengutronix.de>
>>>>> ---
>>>>>    init/Kconfig                                  | 11 +++++++++++
>>>>>    kernel/Makefile                               |  2 +-
>>>>>    tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
>>>>>    3 files changed, 13 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/init/Kconfig b/init/Kconfig
>>>>> index b77c60f8b963..f62fca13ac5b 100644
>>>>> --- a/init/Kconfig
>>>>> +++ b/init/Kconfig
>>>>> @@ -1194,6 +1194,7 @@ endif # NAMESPACES
>>>>>    config CHECKPOINT_RESTORE
>>>>>         bool "Checkpoint/restore support"
>>>>>         select PROC_CHILDREN
>>>>> +     select KCMP
>>>>>         default n
>>>>>         help
>>>>>           Enables additional kernel features in a sake of checkpoint/restore.
>>>>> @@ -1737,6 +1738,16 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
>>>>>    config ARCH_HAS_MEMBARRIER_SYNC_CORE
>>>>>         bool
>>>>>
>>>>> +config KCMP
>>>>> +     bool "Enable kcmp() system call" if EXPERT
>>>>> +     default y
>>>>
>>>> I would expect this to be not default-y, especially if
>>>> CHECKPOINT_RESTORE does a "select" on it.
>>>>
>>>> This is a really powerful syscall, but it is bounded by ptrace access
>>>> controls, and uses pointer address obfuscation, so it may be okay to
>>>> expose this. As it is, at least Ubuntu already has
>>>> CONFIG_CHECKPOINT_RESTORE, so really, there's probably not much
>>>> difference on exposure.
>>>>
>>>> So, if you drop the "default y", I'm fine with this.
>>>
>>> It was maybe stupid, but our userspace started relying on fd
>>> comaprison through sys_kcomp. So for better or worse, if you want to
>>> run the mesa3d gl/vk stacks, you need this.
>>
>> That's overstating things somewhat. The vast majority of applications
>> will work fine regardless (as they did before Mesa started using this
>> functionality). Only some special ones will run into issues, because the
>> user-space drivers incorrectly assume two file descriptors reference
>> different descriptions.
>>
>>
>>> Was maybe not the brighest ideas, but since enough distros had this
>>> enabled by defaults,
>>
>> Right, that (and the above) is why I considered it fair game to use.
>> What should I have done instead? (TBH I was surprised that this
>> functionality isn't generally available)
> 
> Yeah that one is fine, but I thought we've discussed (irc or
> something) more uses for de-duping dma-buf and stuff like that. But
> quick grep says that hasn't landed yet, so I got a bit confused (or
> just dreamt). Looking at this again I'm kinda surprised the drmfd
> de-duping blows up on normal linux distros, but I guess it can all
> happen.

One example: GEM handle name-spaces are per file description. If 
user-space incorrectly assumes two DRM fds are independent, when they 
actually reference the same file description, closing a GEM handle with 
one file descriptor will make it unusable with the other file descriptor 
as well.


>>> Ofc we can leave the default n, but the select if CONFIG_DRM is
>>> unfortunately needed I think.
>>
>> Per above, not sure this is really true.
> 
> We seem to be going boom on linux distros now, maybe userspace got
> more creative in abusing stuff?

I don't know what you're referring to. I've only seen maybe two or three 
reports from people who didn't enable CHECKPOINT_RESTORE in their 
self-built kernels.


> The entire thing is small enough that imo we don't really have to care,
> e.g. we also unconditionally select dma-buf, despite that on most
> systems there's only 1 gpu, and you're never going to end up with a
> buffer sharing case that needs any of that code (aside from the
> "here's an fd" part).
> 
> But I guess we can limit to just KCMP_FILE like you suggest in another
> reply. Just feels a bit like overkill.

Making KCMP_FILE gated by DRM makes as little sense to me as by 
CHECKPOINT_RESTORE.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer


More information about the dri-devel mailing list