[PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace
Christian König
christian.koenig at amd.com
Wed Nov 15 17:34:07 UTC 2017
Am 15.11.2017 um 17:55 schrieb Chris Wilson:
> Quoting Chris Wilson (2017-11-14 14:34:05)
>> Quoting Christian König (2017-11-14 14:24:44)
>>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
>>>> Quoting Christian König (2017-10-30 14:59:04)
>>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>>>> dma_fence_put(old_fence);
>>>>> return;
>>>>> }
>>>>> +
>>>>> + if (!signaled && dma_fence_is_signaled(old_fence)) {
>>>>> + signaled = old_fence;
>>>>> + signaled_idx = i;
>>>>> + }
>>>> How much do we care about only keeping one fence per-ctx here? You could
>>>> rearrange this to break on old_fence->context == fence->context ||
>>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
>>> Yeah, that is what David Zhou suggested as well.
>>>
>>> I've rejected this approach for now cause I think we still have cases
>>> where we rely on one fence per ctx (but I'm not 100% sure).
>>>
>>> I changed patch #1 in this series as you suggest and going to send that
>>> out once more in a minute.
>>>
>>> Can we get this upstream as is for now? I won't have much more time
>>> working on this.
>> Sure, we are only discussing how we might make it look tidier, pure
>> micro-optimisation with the caveat of losing the one-fence-per-ctx
>> guarantee.
> Ah, one thing to note is that extra checking pushed one of our corner
> case tests over its time limit.
>
> If we can completely forgo the one-fence-per-ctx here, what works really
> well in testing is
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 5319ac478918..5755e95fab1b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
> struct reservation_object_list *fobj,
> struct dma_fence *fence)
> {
> - struct dma_fence *replace = NULL;
> - u32 ctx = fence->context;
> - u32 i;
> -
> dma_fence_get(fence);
>
> preempt_disable();
> write_seqcount_begin(&obj->seq);
>
> - for (i = 0; i < fobj->shared_count; ++i) {
> - struct dma_fence *check;
> -
> - check = rcu_dereference_protected(fobj->shared[i],
> - reservation_object_held(obj));
> -
> - if (check->context == ctx || dma_fence_is_signaled(check)) {
> - replace = old_fence;
> - break;
> - }
> - }
> -
> /*
> * memory barrier is added by write_seqcount_begin,
> * fobj->shared_count is protected by this lock too
> */
> - RCU_INIT_POINTER(fobj->shared[i], fence);
> - if (!replace)
> - fobj->shared_count++;
> + RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
>
> write_seqcount_end(&obj->seq);
> preempt_enable();
> -
> - dma_fence_put(replace);
> }
>
> static void
>
> i.e. don't check when not replacing the shared[], on creating the new
> buffer we then discard all the old fences.
>
> It should work for amdgpu as well since you do a ht to coalesce
> redundant fences before queuing.
That won't work for all cases. This way the reservation object would
keep growing without a chance to ever shrink.
Christian.
> -Chris
More information about the amd-gfx
mailing list