[PATCH 2/2] dma-buf: cleanup shared fence removal
Daniel Vetter
daniel at ffwll.ch
Thu Jun 27 17:10:53 UTC 2019
On Thu, Jun 27, 2019 at 03:48:06PM +0000, Koenig, Christian wrote:
> Am 27.06.19 um 17:34 schrieb Daniel Vetter:
> > On Thu, Jun 27, 2019 at 3:19 PM Christian König
> > <ckoenig.leichtzumerken at gmail.com> wrote:
> >> Am 27.06.19 um 12:43 schrieb Daniel Vetter:
> >>> On Thu, Jun 27, 2019 at 12:18 PM Christian König
> >>> <ckoenig.leichtzumerken at gmail.com> wrote:
> >>>> While freeing up memory it is easier to remove a fence from a reservation
> >>>> object instead of signaling it and installing a new one in all other objects.
> >>>>
> >>>> Clean this up by adding the removal function to the core reservation object
> >>>> code instead of messing with the internal in amdgpu.
> >>>>
> >>>> No functional change.
> >>>>
> >>>> Signed-off-by: Christian König <christian.koenig at amd.com>
> >>>> ---
> >>>> drivers/dma-buf/reservation.c | 62 +++++++++++++++++++
> >>>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +-------------
> >>>> include/linux/reservation.h | 3 +-
> >>>> 3 files changed, 65 insertions(+), 45 deletions(-)
> >>>>
> >>>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> >>>> index ef710effedfa..e43a316a005d 100644
> >>>> --- a/drivers/dma-buf/reservation.c
> >>>> +++ b/drivers/dma-buf/reservation.c
> >>>> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
> >>>> }
> >>>> EXPORT_SYMBOL(reservation_object_add_shared_fence);
> >>>>
> >>>> +/**
> >>>> + * reservation_object_remove_shared_fence - remove shared fences
> >>>> + * @obj: the reservation object
> >>>> + * @context: the context of the fences to remove
> >>>> + *
> >>>> + * Remove shared fences based on their fence context.
> >>>> + */
> >>> This needs a serious explanation of "why?". Removing fences without
> >>> guaranteeing they're all signaled is a good way to create havoc. Your
> >>> commit message has a few words about what you're doing here, but it
> >>> still doesn't explain why this is safe and when exactly it should be
> >>> used.
> >> Yeah, I'm not very keen about this either.
> >>
> >> The key point is the memory is not accessible by the hardware any more
> >> because it is freed and removed from the page tables.
> >>
> >> So further access is prevented and in this special case it is actually
> >> valid to do this even if the operation represented by the fence is still
> >> ongoing.
> > Hm, why do you have to remove these fences then? Can't you just let
> > them signal and get collected as usual? As soon as you share buffers
> > these fences can get anywhere, so you need to correctly unwind them no
> > matter what.
> >
> > You're kinda still just describing what you're doing, not why.
>
> It is simply more efficient to remove the fence from one reservation
> object than to add a new fence to all other reservation objects in the
> same process.
Ok, you still talk in riddles and don't explain what the goal of this
entire dance is, so I went and read the code. Assuming I didn't misread
too much:
1. you create a fake fence on a per-process timeline.
2. you attach this liberally to all the bo you're creating on that
process
3. the fence never signals on its own, but it has a very magic
->enable_signaling callback which is the only thing that makes this fence
switch to signalled in a finite time. Before that it's stuck forever. So
quite a bit a schrödinger fence: It's not a real fence (because it fails
to signal in bounded time) except when you start to look at it.
4. Looking at the fence triggers eviction, at that point we replace this
magic eviction fence with the next set, reacquire buffers and then unblock
the kfd process once everything is in shape again.
This is soooooooooooooooooo magic that I really don't think we should
encourage people without clue to maybe use this and totally break all
fences guarantees.
If you do want to make sure an optimized version within
reservation_object.c, then it should be code which replaces fences iff:
- they're the same context
- later in the ordering within that context
- of the same type (i.e. safe vs shared)
That would actually be safe thing to do.
Also, the above is what I expected when asking "why do you need this", not
"we replace fences, its more efficient" I kinda got that from the code :-)
Plus reading this now with (at least the believe of) understanding what
you're doing, replacing the fences and reattaching the next one of these
magic fences doesn't feel like it's actually making anything faster. Just
more obscure ...
Looking at reservation_object_add_shared_fence it seems to dtrt already.
-Daniel
> >> How should we word this? Something like:
> >>
> >> * Remove shared fences based on their fence context. Removing a fence
> >> before it is signaled is only valid if hardware access is prevented by
> >> some other means like IOMMU or similar virtual memory protection.
> > The part that freaks me out is whether we break the fence chaing
> > anywhere by removing fences. But I guess if you're only removing the
> > shared fences that shoudl be fine ...
> >
> >>> Also I'm not sure (depending upon what you use this for) this is
> >>> actually correct outside of amdgpu and it's ignorance of the exclusive
> >>> fence.
> >> No, that is completely unrelated. But I thought that I clean this up
> >> before I start to tackle the exclusive fence issue.
> > ... except amdgpu has internally a very strange idea of shared fences
> > with auto-exclusive promotion. And I think removing exclusive fences
> > will break the fence dependencies of other (earlier) dma drivers by
> > other operations. Or is that why you filter on the context,
> > essentially amd's way of saying "remove all shared fences for this
> > thing, keep only the exclusive ones".
> >
> > I guess I need to read what this actually does in the code, since I
> > still have no idea why you need to do this.
> > -Daniel
> >
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>> +int reservation_object_remove_shared_fence(struct reservation_object *obj,
> >>>> + uint64_t context)
> >>>> +{
> >>>> + struct reservation_object_list *old, *new;
> >>>> + unsigned int i, j, k;
> >>>> +
> >>>> + reservation_object_assert_held(obj);
> >>>> +
> >>>> + old = reservation_object_get_list(obj);
> >>>> + if (!old)
> >>>> + return 0;
> >>>> +
> >>>> + new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
> >>>> + GFP_KERNEL);
> >>>> + if (!new)
> >>>> + return -ENOMEM;
> >>>> +
> >>>> + /* Go through all the shared fences in the resevation object and sort
> >>>> + * the interesting ones to the end of the new list.
> >>>> + */
> >>>> + for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
> >>>> + struct dma_fence *f;
> >>>> +
> >>>> + f = rcu_dereference_protected(old->shared[i],
> >>>> + reservation_object_held(obj));
> >>>> +
> >>>> + if (f->context == context)
> >>>> + RCU_INIT_POINTER(new->shared[--j], f);
> >>>> + else
> >>>> + RCU_INIT_POINTER(new->shared[k++], f);
> >>>> + }
> >>>> + new->shared_max = old->shared_max;
> >>>> + new->shared_count = k;
> >>>> +
> >>>> + /* Install the new fence list, seqcount provides the barriers */
> >>>> + preempt_disable();
> >>>> + write_seqcount_begin(&obj->seq);
> >>>> + RCU_INIT_POINTER(obj->fence, new);
> >>>> + write_seqcount_end(&obj->seq);
> >>>> + preempt_enable();
> >>>> +
> >>>> + /* Drop the references to the removed fences */
> >>>> + for (i = j, k = 0; i < old->shared_count; ++i) {
> >>>> + struct dma_fence *f;
> >>>> +
> >>>> + f = rcu_dereference_protected(new->shared[i],
> >>>> + reservation_object_held(obj));
> >>>> + dma_fence_put(f);
> >>>> + }
> >>>> + kfree_rcu(old, rcu);
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +EXPORT_SYMBOL(reservation_object_remove_shared_fence);
> >>>> +
> >>>> /**
> >>>> * reservation_object_add_excl_fence - Add an exclusive fence.
> >>>> * @obj: the reservation object
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> index 10abae398e51..9b25ad3d003e 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> >>>> @@ -224,50 +224,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
> >>>> if (!ef)
> >>>> return -EINVAL;
> >>>>
> >>>> - old = reservation_object_get_list(resv);
> >>>> - if (!old)
> >>>> - return 0;
> >>>> -
> >>>> - new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]),
> >>>> - GFP_KERNEL);
> >>>> - if (!new)
> >>>> - return -ENOMEM;
> >>>> -
> >>>> - /* Go through all the shared fences in the resevation object and sort
> >>>> - * the interesting ones to the end of the list.
> >>>> - */
> >>>> - for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
> >>>> - struct dma_fence *f;
> >>>> -
> >>>> - f = rcu_dereference_protected(old->shared[i],
> >>>> - reservation_object_held(resv));
> >>>> -
> >>>> - if (f->context == ef->base.context)
> >>>> - RCU_INIT_POINTER(new->shared[--j], f);
> >>>> - else
> >>>> - RCU_INIT_POINTER(new->shared[k++], f);
> >>>> - }
> >>>> - new->shared_max = old->shared_max;
> >>>> - new->shared_count = k;
> >>>> -
> >>>> - /* Install the new fence list, seqcount provides the barriers */
> >>>> - preempt_disable();
> >>>> - write_seqcount_begin(&resv->seq);
> >>>> - RCU_INIT_POINTER(resv->fence, new);
> >>>> - write_seqcount_end(&resv->seq);
> >>>> - preempt_enable();
> >>>> -
> >>>> - /* Drop the references to the removed fences or move them to ef_list */
> >>>> - for (i = j, k = 0; i < old->shared_count; ++i) {
> >>>> - struct dma_fence *f;
> >>>> -
> >>>> - f = rcu_dereference_protected(new->shared[i],
> >>>> - reservation_object_held(resv));
> >>>> - dma_fence_put(f);
> >>>> - }
> >>>> - kfree_rcu(old, rcu);
> >>>> -
> >>>> - return 0;
> >>>> + return reservation_object_remove_shared_fence(resv, ef->base.context);
> >>>> }
> >>>>
> >>>> static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain,
> >>>> diff --git a/include/linux/reservation.h b/include/linux/reservation.h
> >>>> index f47e8196d039..1c833a56b678 100644
> >>>> --- a/include/linux/reservation.h
> >>>> +++ b/include/linux/reservation.h
> >>>> @@ -229,7 +229,8 @@ int reservation_object_reserve_shared(struct reservation_object *obj,
> >>>> unsigned int num_fences);
> >>>> void reservation_object_add_shared_fence(struct reservation_object *obj,
> >>>> struct dma_fence *fence);
> >>>> -
> >>>> +int reservation_object_remove_shared_fence(struct reservation_object *obj,
> >>>> + uint64_t context);
> >>>> void reservation_object_add_excl_fence(struct reservation_object *obj,
> >>>> struct dma_fence *fence);
> >>>>
> >>>> --
> >>>> 2.17.1
> >>>>
> >>>> _______________________________________________
> >>>> dri-devel mailing list
> >>>> dri-devel at lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the amd-gfx
mailing list