[PATCH] drm/ttm: Don't inherit GEM object VMAs in child process
felix.kuehling at amd.com
Tue Jan 4 18:08:09 UTC 2022
Am 2021-12-23 um 2:05 a.m. schrieb Christian König:
> Am 22.12.21 um 21:53 schrieb Daniel Vetter:
>> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:
>> Still sounds funky. I think minimally we should have an ack from CRIU
>> developers that this is officially the right way to solve this
>> problem. I
>> really don't want to have random one-off hacks that don't work across
>> board, for a problem where we (drm subsystem) really shouldn't be the
>> one with this problem. Where "this problem" means that the mmap space is
>> per file description, and not per underlying inode or real device or
>> whatever. That part sounds like a CRIU problem, and I expect CRIU folks
>> want a consistent solution across the board for this. Hence please
>> grab an
>> ack from them.
> Unfortunately it's a KFD design problem. AMD used a single device
> node, then mmaped different objects from the same offset to different
> processes and expected it to work the rest of the fs subsystem without
This may be true for mmaps in the KFD device, but not for mmaps in the
DRM render nodes.
> So yes, this is indeed because the mmap space is per file descriptor
> for the use case here.
No. This is a different problem.
The problem has to do with the way that DRM manages mmap permissions. In
order to be able to mmap an offset in the render node, there needs to be
a BO that was created in the same render node. If you fork a process, it
inherits the VMA. But KFD doesn't know anything about the inherited BOs
from the parent process. Therefore those BOs don't get checkpointed and
restored in the child process. When the CRIU checkpoint is restored, our
CRIU plugin never creates a BO corresponding to the VMA in the child
process' render node FD. We've also lost the relationship between the
parent and child-process' render node FDs. After "fork" the render node
FD points to the same struct file in parent and child. After restoring
the CRIU checkpoint, they are separate struct files, created by separate
"open" system calls. Therefore the mmap call that restores the VMA fails
in the child process.
At least for KFD, there is no point inheriting BOs from a child process,
because the GPU has no way of accessing the BOs in the child process.
The child process has no GPU address space, no user mode queues, no way
to do anything with the GPU before it completely reinitializes its KFD
We can workaround this issue in user mode with madvise(...,
MADV_DONTFORK). In fact we've already done this for some BOs to avoid a
memory leak in the parent process while a child process exists. But it's
slightly racy because there is a short time window where VMA exists
without the VM_DONTCOPY flag. A fork during that time window could still
create a child process with an inherited VMA.
Therefore a safer solution is to set the vm_flags in the VMA in the
driver when the VMA is first created.
> And thanks for pointing this out, this indeed makes the whole change
> extremely questionable.
>> Cheers, Daniel
More information about the amd-gfx