[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
christian.koenig at amd.com
Thu Jan 6 16:48:24 UTC 2022
Am 06.01.22 um 17:45 schrieb Felix Kuehling:
> Am 2022-01-06 um 4:05 a.m. schrieb Christian König:
>> Am 05.01.22 um 17:16 schrieb Felix Kuehling:
>>>>> But KFD doesn't know anything about the inherited BOs
>>>>> from the parent process.
>>>> Ok, why that? When the KFD is reinitializing it's context why
>>>> shouldn't it cleanup those VMAs?
>>> That cleanup has to be initiated by user mode. Basically closing the old
>>> KFD and DRM file descriptors, cleaning up all the user mode VM state,
>>> unmapping all the VMAs, etc. Then it reopens KFD and the render nodes
>>> and starts from scratch.
>>> User mode will do this automatically when it tries to reinitialize ROCm.
>>> However, in this case the child process doesn't do that (e.g. a python
>>> application using the multi-processing package). The child process does
>>> not use ROCm. But you're left with all the dangling VMAs in the child
>>> process indefinitely.
>> Oh, not that one again. I'm unfortunately pretty sure that this is an
>> clear NAK then.
>> This python multi-processing package is violating various
>> specifications by doing this fork() and we already had multiple
>> discussions about that.
> Well, it's in wide-spread use. We can't just throw up our hands and say
> they're buggy and not supported.
Because that's not my NAK, but rather from upstream.
> Also, why does your ACK or NAK depend on this at all. If it's the right
> thing to do, it's the right thing to do regardless of who benefits from
> it. In addition, how can a child process that doesn't even use the GPU
> be in violation of any GPU-driver related specifications.
The argument is that the application is broken and needs to be fixed
instead of worked around inside the kernel.
>> Let's talk about this on Mondays call. Thanks for giving the whole
More information about the amd-gfx