[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

Fri Jan 14 17:40:58 UTC 2022

Am 2022-01-14 um 12:26 p.m. schrieb Christian König:
> Am 14.01.22 um 17:44 schrieb Daniel Vetter:
>> Top post because I tried to catch up on the entire discussion here.
>>
>> So fundamentally I'm not opposed to just close this fork() hole once and
>> for all. The thing that worries me from a upstream/platform pov is
>> really
>> only if we don't do it consistently across all drivers.
>>
>> So maybe as an idea:
>> - Do the original patch, but not just for ttm but all gem rendernode
>>    drivers at least (or maybe even all gem drivers, no idea), with the
>>    below discussion cleaned up as justification.
>
> I know of at least one use case which this will break.
>
> A couple of years back we had a discussion on the Mesa mailing list
> because (IIRC) Marek introduced a background thread to push command
> submissions to the kernel.
>
> That broke because some compositor used to initialize OpenGL and then
> do a fork(). This indeed worked previously (no GPUVM at that time),
> but with the addition of the backround thread obviously broke.
>
> The conclusion back then was that the compositor is broken and needs
> fixing, but it still essentially means that there could be people out
> there with really old userspace where this setting would just break
> the desktop.
>
> I'm not really against that change either, but at least in theory we
> could make fork() work perfectly fine even with VMs and background
> threads.

You may regret this if you ever try to build a shared virtual address
space between GPU and CPU. Then you have two processes (parent and
child) sharing the same render context and GPU VM address space. But the
CPU address spaces are different. You can't maintain consistent shared
virtual address spaces for both processes when the GPU address space is
shared between them.

Regards,
  Felix

>
> Regards,
> Christian.
>
>> - Get acks from userspace driver folks who know real-world gl/vk
>> usage and
>>    khr specs in-depth enough to be able to tell us how much we'll regret
>>    this.
>>
>> - Merge it. Or come up with something new. Or maybe stick to the
>> Nack, but
>>    then maybe it would be good to document that somewhere in-tree?
>>
>> This entire can of worms just feels way too tricky to have it be handled
>> inconsistently across drivers. And trying to fix these kind of low-level
>> things across drivers once divergences exists is just really painful
>> (e.g.
>> trying to make all dma-buf mmap VM_SPECIAL or the herculeian task
>> Christian is doing to get our dma_resv rules consistent across drivers).
>>
>> Cheers, Daniel
>>
>> On Fri, Jan 07, 2022 at 12:47:45PM -0500, Felix Kuehling wrote:
>>> Am 2022-01-07 um 3:56 a.m. schrieb Christian König:
>>>> Am 06.01.22 um 17:51 schrieb Felix Kuehling:
>>>>> Am 2022-01-06 um 11:48 a.m. schrieb Christian König:
>>>>>> Am 06.01.22 um 17:45 schrieb Felix Kuehling:
>>>>>>> Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
>>>>>>> [SNIP]
>>>>>>> Also, why does your ACK or NAK depend on this at all. If it's the
>>>>>>> right
>>>>>>> thing to do, it's the right thing to do regardless of who benefits
>>>>>>> from
>>>>>>> it. In addition, how can a child process that doesn't even use
>>>>>>> the GPU
>>>>>>> be in violation of any GPU-driver related specifications.
>>>>>> The argument is that the application is broken and needs to be fixed
>>>>>> instead of worked around inside the kernel.
>>>>> I still don't get how they the application is broken. Like I said,
>>>>> the
>>>>> child process is not using the GPU. How can the application be
>>>>> fixed in
>>>>> this case?
>>>> Sounds like I'm still missing some important puzzle pieces for the
>>>> full picture to figure out why this doesn't work.
>>>>
>>>>> Are you saying, any application that forks and doesn't immediately
>>>>> call
>>>>> exec is broken?
>>>> More or less yes. We essentially have three possible cases here:
>>>>
>>>> 1. Application is already using (for example) OpenGL or OpenCL to do
>>>> some rendering on the GPU and then calls fork() and expects to use
>>>> OpenGL both in the parent and the child at the same time.
>>>>      As far as I know that's illegal from the Khronos specification
>>>> point of view and working around inside the kernel for something not
>>>> allowed in the first place is seen as futile effort.
>>>>
>>>> 2. Application opened the file descriptor, forks and then initializes
>>>> OpenGL/Vulkan/OpenCL.
>>>>      That's what some compositors are doing to drop into the backround
>>>> and is explicitly legal.
>>>>
>>>> 3. Application calls fork() and then doesn't use the GPU in the child.
>>>> Either by not using it or calling exec.
>>>>      That should be legal and not cause any problems in the first
>>>> place.
>>>>
>>>> But from your description I still don't get why we are running into
>>>> problems here.
>>>>
>>>> I was assuming that you have case #1 because we previously had some
>>>> examples of this with this python library, but from your description
>>>> it seems to be case #3.
>>> Correct. #3 has at least one issue we previously worked around in the
>>> Thunk: The inherited VMAs prevent BOs from being freed in the parent
>>> process. This manifests as an apparent memory leak. Therefore the Thunk
>>> calls madvise(..., MADV_DONTFORK) on all its VRAM allocation. The BOs
>>> that are causing problems with CRIU are GTT BOs that weren't covered by
>>> this previous workaround.
>>>
>>> The new issue with CRIU is, that CRIU saves and restores all the VMAs.
>>> When trying to restore the inherited VMAs in the child process, the
>>> mmap
>>> call fails because the child process' render node FD is no longer
>>> inherited from the parent, but is instead created by its own "open"
>>> system call. The mmap call in the child fails for at least two reasons:
>>>
>>>    * The child process' render node FD doesn't have permission to
>>> map the
>>>      parent process' BO any more
>>>    * The parent BO doesn't get restored in the child process, so its
>>> mmap
>>>      offset doesn't get updated to the new mmap offset of the restored
>>>      parent BO by the amdgpu CRIU plugin
>>>
>>> We could maybe add a whole bunch of complexity in CRIU and our CRIU
>>> plugin to fix this. But it's pointless because like you said, actually
>>> doing anything with the BO in the child process is illegal anyway
>>> (scenario #1 above). The easiest solution seems to be, to just not
>>> inherit the VMA in the first place.
>>>
>>> Regards,
>>>    Felix
>>>
>>>
>>>>> Or does an application that forks need to be aware that some other
>>>>> part
>>>>> of the application used the GPU and explicitly free any GPU
>>>>> resources?
>>>> Others might fill that information in, but I think that was the plan
>>>> for newer APIs like Vulkan.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Thanks,
>>>>>     Felix
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Regards,
>>>>>>>      Felix
>>>>>>>
>>>>>>>
>>>>>>>> Let's talk about this on Mondays call. Thanks for giving the whole
>>>>>>>> context.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>       Felix
>>>>>>>>>
>