[RFC PATCH v1] dma-fence-array: Deal with sub-fences that are signaled late

Thu Aug 13 06:49:24 UTC 2020

Quoting Jordan Crouse (2020-08-13 00:55:44)
> This is an RFC because I'm still trying to grok the correct behavior.
> 
> Consider a dma_fence_array created two two fence and signal_on_any is true.
> A reference to dma_fence_array is taken for each waiting fence.
> 
> When the client calls dma_fence_wait() only one of the fences is signaled.
> The client returns successfully from the wait and puts it's reference to
> the array fence but the array fence still remains because of the remaining
> un-signaled fence.
> 
> Now consider that the unsignaled fence is signaled while the timeline is being
> destroyed much later. The timeline destroy calls dma_fence_signal_locked(). The
> following sequence occurs:
> 
> 1) dma_fence_array_cb_func is called
> 
> 2) array->num_pending is 0 (because it was set to 1 due to signal_on_any) so the
> callback function calls dma_fence_put() instead of triggering the irq work
> 
> 3) The array fence is released which in turn puts the lingering fence which is
> then released
> 
> 4) deadlock with the timeline

It's the same recursive lock as we previously resolved in sw_sync.c by
removing the locking from timeline_fence_release().
-Chris