[PATCH] dma-buf/sw_sync: Avoid recursive lock during fence signal.

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 14 16:25:59 UTC 2020


Quoting Bas Nieuwenhuizen (2020-07-14 16:41:02)
> Calltree:
>   timeline_fence_release
>   drm_sched_entity_wakeup
>   dma_fence_signal_locked
>   sync_timeline_signal
>   sw_sync_ioctl
> 
> Releasing the reference to the fence in the fence signal callback
> seems reasonable to me, so this patch avoids the locking issue in
> sw_sync.
> 
> d3862e44daa7 ("dma-buf/sw-sync: Fix locking around sync_timeline lists")
> fixed the recursive locking issue but caused an use-after-free. Later
> d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
> fixed the use-after-free but reintroduced the recursive locking issue.
> 
> In this attempt we avoid the use-after-free still because the release
> function still always locks, and outside of the locking region in the
> signal function we have properly refcounted references.
> 
> We furthermore also avoid the recurive lock by making sure that either:
> 
> 1) We have a properly refcounted reference, preventing the signal from
>    triggering the release function inside the locked region.
> 2) The refcount was already zero, and hence nobody will be able to trigger
>    the release function from the signal function.
> 
> Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
> Cc: Sumit Semwal <sumit.semwal at linaro.org>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Gustavo Padovan <gustavo at padovan.org>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: <stable at vger.kernel.org>
> Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> ---
>  drivers/dma-buf/sw_sync.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 348b3a9170fa..30a482f75d56 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -192,9 +192,12 @@ static const struct dma_fence_ops timeline_fence_ops = {
>  static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
>  {
>         struct sync_pt *pt, *next;
> +       struct list_head ref_list;
>  
>         trace_sync_timeline(obj);
>  
> +       INIT_LIST_HEAD(&ref_list);
> +
>         spin_lock_irq(&obj->lock);
>  
>         obj->value += inc;
> @@ -206,18 +209,27 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
>                 list_del_init(&pt->link);
>                 rb_erase(&pt->node, &obj->pt_tree);
>  
> -               /*
> -                * A signal callback may release the last reference to this
> -                * fence, causing it to be freed. That operation has to be
> -                * last to avoid a use after free inside this loop, and must
> -                * be after we remove the fence from the timeline in order to
> -                * prevent deadlocking on timeline->lock inside
> -                * timeline_fence_release().
> -                */
> +               /* We need to take a reference to avoid a release during
> +                * signalling (which can cause a recursive lock of obj->lock).
> +                * If refcount was already zero, another thread is already taking
> +                * care of destructing the fence, so the signal cannot release
> +                * it again and we hence will not have the recursive lock. */

/*
 * Block commentary style:
 * https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting
 */

> +               if (dma_fence_get_rcu(&pt->base))
> +                       list_add_tail(&pt->link, &ref_list);

Ok.

> +
>                 dma_fence_signal_locked(&pt->base);
>         }
>  
>         spin_unlock_irq(&obj->lock);
> +
> +       list_for_each_entry_safe(pt, next, &ref_list, link) {
> +               /* This needs to be cleared before release, otherwise the
> +                * timeline_fence_release function gets confused about also
> +                * removing the fence from the pt_tree. */
> +               list_del_init(&pt->link);
> +
> +               dma_fence_put(&pt->base);
> +       }

How serious is the problem of one fence callback freeing another pt?

Following the pattern here

	spin_lock(&obj->lock);
        list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
                if (!timeline_fence_signaled(&pt->base))
                        break;

		if (!dma_fence_get_rcu(&pt->base))
			continue; /* too late! */

                rb_erase(&pt->node, &obj->pt_tree);
                list_move_tail(&pt->link, &ref_list);
	}
	spin_unlock(&obj->lock);

        list_for_each_entry_safe(pt, next, &ref_list, link) {
		list_del_init(&pt->link);
		dma_fence_signal(&pt->base);
		dma_fence_put(&pt->base);
	}

Marginal extra cost for signaling along the debug sw_timeline for total
peace of mind.
-Chris


More information about the dri-devel mailing list