[PATCH RFC 05/24] Revert "drm: Nerf the preclose callback for modern drivers"

Thu May 24 09:24:02 UTC 2018

On Thu, May 24, 2018 at 2:46 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 24.05.2018 um 03:38 schrieb Qiang Yu:
>
> [SNIP]
>
> Adding fence is done already, and I did wait it before unmap. But then
> I see when
> the buffer is shared between processes, the "perfect wait" is just
> wait the fence
> from this process's task, so it's better to also distinguish fences.
> If so, I just think
> why we don't just wait tasks from this process in the preclose before
> unmap/free
> buffer in the drm_release()?
>
> Well it depends on your VM management. When userspace expects that the VM
> space the BO used is reusable immediately than the TTM callback won't work.
>
> On the other hand you can just grab the list of fences on a BO and filter
> out the ones from your current process and wait for those. See
> amdgpu_sync_resv() as an example how to do that.
>
> In current lima implementation, user space driver is responsible not
> unmap/free
> buffer before task is complete. And VM map/unmap is not differed.
>
>
> Well it's up to you how to design userspace, but in the past doing it like
> that turned out to be a rather bad design decision.
>
> Keep in mind that the kernel driver must guarantee that a shaders can never
> access freed up memory.
>
> Otherwise taking over the system from an unprivileged processes becomes just
> a typing exercise when you manage to access freed memory which is now used
> for a page table.

Right, I know this has to be avoided.

>
> Because of this we have a separate tracking in amdgpu so that we not only
> know who is using which BO, who is using which VM.

amdgpu's VM implementation seems too complicated for this simple mali GPU,
but I may investigate it more to see if I can make it better.

>
> This works simple and fine except the case that user press Ctrl+C to
> terminate
> the application which will force to close drm fd.
>
>
> I'm not sure if that actually works as fine as you think.
>
> For an example of what we had to add to prevent security breaches, take a
> look at amdgpu_gem_object_close().
>
> I'd more prefer to wait buffer fence before vm unmap and filter like
> amdgpu_sync_resv() compared to implement refcount in kernel task.
> But these two ways are all not as simple as preclose.
>
>
> Well, I would rather say you should either delay VM unmap operations until
> all users of the VM are done with their work using the ttm_bo_destroy
> callback.
>
> Or you block in the gem_close_object callback until all tasks using the BO
> are done with it.
>
> So I still don't understand why you don't want to get preclose back even
> have to introduce other complicated mechanism to cover the case free/unmap
> buffer before this process's task is done?
>
>
> We intentionally removed the preclose callback to prevent certain use cases,
> bringing it back to allow your use case looks rather fishy to me.

Seems other drivers do either the deffer or wait way to adopt the drop
of preclose. I can do the same as you suggested, but just not understand why
we make our life harder. Can I know what's the case you want to prevent?

>
> BTW: What exactly is the issue with using the postclose callback?

The issue is, when Ctrl+C to terminate an application, if no wait or deffer
unmap, buffer just gets unmapped before task is done, so kernel driver
gets MMU fault and HW reset to recover the GPU.

Regards,
Qiang

>
> Regards,
> Christian.
>
>
> Regards,
> Qiang
>
>