[PATCH 08/10] dma-buf/resv: replace shared fence with new fences container

Christian König ckoenig.leichtzumerken at gmail.com
Thu Aug 22 08:37:20 UTC 2019


Am 21.08.19 um 19:35 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-21 16:24:22)
>> Quoting Christian König (2019-08-21 13:31:45)
>>> @@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>>                  busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
>>>   
>>>          /* Translate shared fences to READ set of engines */
>>> -       list = rcu_dereference(obj->base.resv->fence);
>>> -       if (list) {
>>> -               unsigned int shared_count = list->shared_count, i;
>>> -
>>> -               for (i = 0; i < shared_count; ++i) {
>>> -                       struct dma_fence *fence =
>>> -                               rcu_dereference(list->shared[i]);
>>> -
>>> -                       args->busy |= busy_check_reader(fence);
>>> -               }
>>> -       }
>>> +       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
>>> +       dma_fence_array_for_each(fence, cursor, readers)
>>> +               args->busy |= busy_check_reader(fence);
>>> +       dma_fence_put(readers);
>> That's underwhelming, the full-mb shows up in scaling tests (I'll test
>> the impact of this series later). Something like,
> To put some numbers to it, adding the full-mb adds 5ns to a single
> thread on Kabylake and 20ns under contention.

The question is if that's the use case we want to optimize for.

Querying a buffer for business is something we do absolutely rarely on 
amdgpu, e.g. IIRC we even grab the full reservation lock for this.

But adding new fences comes with every command submission.

What could maybe work is the "do { } while (fence has changed); loop you 
suggested earlier in this mail thread, but I need to double check if 
that would really work or clash with recycling dma_fence_arrays().

Christian.

> -Chris



More information about the dri-devel mailing list