[RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late

Christian König christian.koenig at amd.com
Thu Aug 13 06:52:09 UTC 2020


Am 13.08.20 um 01:55 schrieb Jordan Crouse:
> This is an RFC because I'm still trying to grok the correct behavior.
>
> Consider a dma_fence_array created two two fence and signal_on_any is true.
> A reference to dma_fence_array is taken for each waiting fence.

Ok, that sounds like you seem to mix a couple of things up here.

A dma_fence_array takes the reference to the fences it contains on 
creation. There is only one reference to the dma_fence_array even if it 
contains N unsignaled fences.

What we do is to grab a reference to the array in 
dma_fence_array_enable_signaling(), but this is because we are 
registering the callback here.

> When the client calls dma_fence_wait() only one of the fences is signaled.
> The client returns successfully from the wait and puts it's reference to
> the array fence but the array fence still remains because of the remaining
> un-signaled fence.

If signaling was enabled then this is correct, because otherwise we 
would crash when the other callbacks are called.

> Now consider that the unsignaled fence is signaled while the timeline is being
> destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
> following sequence occurs:
>
> 1) dma_fence_array_cb_func is called
>
> 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
> callback function calls dma_fence_put() instead of triggering the irq work
>
> 3) The array fence is released which in turn puts the lingering fence which is
> then released
>
> 4) deadlock with the timeline

Why do we have a deadlock here? That doesn't seems to add up.

Christian.

>
> I think that we can fix this with the attached patch. Once the fence is
> signaled signaling it again in the irq worker shouldn't hurt anything. The only
> gotcha might be how the error is propagated - I wasn't quite sure the intent of
> clearing it only after getting to the irq worker.
>
> Signed-off-by: Jordan Crouse <jcrouse at codeaurora.org>
> ---
>
>   drivers/dma-buf/dma-fence-array.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c
> index d3fbd950be94..b8829b024255 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -46,8 +46,6 @@ static void irq_dma_fence_array_work(struct irq_work *wrk)
>   {
>   	struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
>   
> -	dma_fence_array_clear_pending_error(array);
> -
>   	dma_fence_signal(&array->base);
>   	dma_fence_put(&array->base);
>   }
> @@ -61,10 +59,10 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
>   
>   	dma_fence_array_set_pending_error(array, f->error);
>   
> -	if (atomic_dec_and_test(&array->num_pending))
> -		irq_work_queue(&array->work);
> -	else
> -		dma_fence_put(&array->base);
> +	if (!atomic_dec_and_test(&array->num_pending))
> +		dma_fence_array_set_pending_error(array, f->error);
> +
> +	irq_work_queue(&array->work);
>   }
>   
>   static bool dma_fence_array_enable_signaling(struct dma_fence *fence)



More information about the dri-devel mailing list