[PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

Tue Aug 7 18:30:43 UTC 2018

Quoting Christian König (2018-08-07 19:18:55)
> Am 07.08.2018 um 20:14 schrieb Chris Wilson:
> > Quoting Christian König (2018-08-07 18:57:16)
> >> Am 07.08.2018 um 18:08 schrieb Chris Wilson:
> >>> amdgpu only uses shared-fences internally, but dmabuf importers rely on
> >>> implicit write hazard tracking via the reservation_object.fence_excl.
> >>> For example, the importer use the write hazard for timing a page flip to
> >>> only occur after the exporter has finished flushing its write into the
> >>> surface. As such, on exporting a dmabuf, we must either flush all
> >>> outstanding fences (for we do not know which are writes and should have
> >>> been exclusive) or alternatively create a new exclusive fence that is
> >>> the composite of all the existing shared fences, and so will only be
> >>> signaled when all earlier fences are signaled (ensuring that we can not
> >>> be signaled before the completion of any earlier write).
> >>>
> >>> v2: reservation_object is already locked by amdgpu_bo_reserve()
> >>>
> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
> >>> Testcase: igt/amd_prime/amd-to-i915
> >>> References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)")
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: Alex Deucher <alexander.deucher at amd.com>
> >>> Cc: "Christian König" <christian.koenig at amd.com>
> >>> ---
> >>>
> >>> This time, hopefully proofread and references complete.
> >>> -Chris
> >>>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ++++++++++++++++++++---
> >>>    1 file changed, 60 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> index 1c5d97f4b4dd..dff2b01a3d89 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> @@ -37,6 +37,7 @@
> >>>    #include "amdgpu_display.h"
> >>>    #include <drm/amdgpu_drm.h>
> >>>    #include <linux/dma-buf.h>
> >>> +#include <linux/dma-fence-array.h>
> >>>    
> >>>    static const struct dma_buf_ops amdgpu_dmabuf_ops;
> >>>    
> >>> @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
> >>>        return ERR_PTR(ret);
> >>>    }
> >>>    
> >>> +static int
> >>> +__reservation_object_make_exclusive(struct reservation_object *obj)
> >>> +{
> >>> +     struct reservation_object_list *fobj;
> >>> +     struct dma_fence_array *array;
> >>> +     struct dma_fence **fences;
> >>> +     unsigned int count, i;
> >>> +
> >>> +     fobj = reservation_object_get_list(obj);
> >>> +     if (!fobj)
> >>> +             return 0;
> >>> +
> >>> +     count = !!rcu_access_pointer(obj->fence_excl);
> >>> +     count += fobj->shared_count;
> >>> +
> >>> +     fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
> >>> +     if (!fences)
> >>> +             return -ENOMEM;
> >>> +
> >>> +     for (i = 0; i < fobj->shared_count; i++) {
> >>> +             struct dma_fence *f =
> >>> +                     rcu_dereference_protected(fobj->shared[i],
> >>> +                                               reservation_object_held(obj));
> >>> +
> >>> +             fences[i] = dma_fence_get(f);
> >>> +     }
> >>> +
> >>> +     if (rcu_access_pointer(obj->fence_excl)) {
> >>> +             struct dma_fence *f =
> >>> +                     rcu_dereference_protected(obj->fence_excl,
> >>> +                                               reservation_object_held(obj));
> >>> +
> >>> +             fences[i] = dma_fence_get(f);
> >>> +     }
> >>> +
> >>> +     array = dma_fence_array_create(count, fences,
> >>> +                                    dma_fence_context_alloc(1), 0,
> >>> +                                    false);
> >>> +     if (!array)
> >>> +             goto err_fences_put;
> >>> +
> >>> +     reservation_object_add_excl_fence(obj, &array->base);
> >>> +     return 0;
> >>> +
> >>> +err_fences_put:
> >>> +     for (i = 0; i < count; i++)
> >>> +             dma_fence_put(fences[i]);
> >>> +     kfree(fences);
> >>> +     return -ENOMEM;
> >>> +}
> >>> +
> >> This can be simplified a lot. See amdgpu_pasid_free_delayed() for an
> >> example:
> > {
> >       if (!reservation_object_get_list(obj))
> >               return 0;
> >
> >       r = reservation_object_get_fences_rcu(obj, NULL, &count, &fences);
> >       if (r)
> >               return r;
> >
> >       array = dma_fence_array_create(count, fences,
> >                                      dma_fence_context_alloc(1), 0,
> >                                      false);
> >       if (!array)
> >               goto err_fences_put;
> >
> >       reservation_object_add_excl_fence(obj, &array->base);
> >       return 0;
> >
> > err:
> >       ...
> > }
> >
> > My starting point was going to be use get_fences_rcu, but get_fences_rcu
> > can hardly be called simple for where the lock is required to be held
> > start to finish ;)
> 
> What are you talking about? get_fences_rcu doesn't require any locking 
> at all.
> 
> You only need to the locking to make sure that between creating the 
> fence array and calling reservation_object_add_excl_fence() no other 
> fence is added.

Exactly. That's what need to be absolutely clear from the context.
I didn't say anything about the locking requirements for get_fences_rcu
just the opposite.
-Chris