[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

Felix Kuehling felix.kuehling at amd.com
Thu Jan 6 16:51:30 UTC 2022


Am 2022-01-06 um 11:48 a.m. schrieb Christian König:
> Am 06.01.22 um 17:45 schrieb Felix Kuehling:
>> Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
>>> Am 05.01.22 um 17:16 schrieb Felix Kuehling:
>>>> [SNIP]
>>>>>> But KFD doesn't know anything about the inherited BOs
>>>>>> from the parent process.
>>>>> Ok, why that? When the KFD is reinitializing it's context why
>>>>> shouldn't it cleanup those VMAs?
>>>> That cleanup has to be initiated by user mode. Basically closing
>>>> the old
>>>> KFD and DRM file descriptors, cleaning up all the user mode VM state,
>>>> unmapping all the VMAs, etc. Then it reopens KFD and the render nodes
>>>> and starts from scratch.
>>>>
>>>> User mode will do this automatically when it tries to reinitialize
>>>> ROCm.
>>>> However, in this case the child process doesn't do that (e.g. a python
>>>> application using the multi-processing package). The child process
>>>> does
>>>> not use ROCm. But you're left with all the dangling VMAs in the child
>>>> process indefinitely.
>>> Oh, not that one again. I'm unfortunately pretty sure that this is an
>>> clear NAK then.
>>>
>>> This python multi-processing package is violating various
>>> specifications by doing this fork() and we already had multiple
>>> discussions about that.
>> Well, it's in wide-spread use. We can't just throw up our hands and say
>> they're buggy and not supported.
>
> Because that's not my NAK, but rather from upstream.
>
>> Also, why does your ACK or NAK depend on this at all. If it's the right
>> thing to do, it's the right thing to do regardless of who benefits from
>> it. In addition, how can a child process that doesn't even use the GPU
>> be in violation of any GPU-driver related specifications.
>
> The argument is that the application is broken and needs to be fixed
> instead of worked around inside the kernel.

I still don't get how they the application is broken. Like I said, the
child process is not using the GPU. How can the application be fixed in
this case?

Are you saying, any application that forks and doesn't immediately call
exec is broken?

Or does an application that forks need to be aware that some other part
of the application used the GPU and explicitly free any GPU resources?

Thanks,
  Felix


>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> Let's talk about this on Mondays call. Thanks for giving the whole
>>> context.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>>     Felix
>>>>
>


More information about the amd-gfx mailing list