[PATCH] dma-buf/sw_sync: Avoid recursive lock during fence signal.

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 14 19:43:08 UTC 2020


Quoting Chris Wilson (2020-07-14 19:30:39)
> Quoting Bas Nieuwenhuizen (2020-07-14 19:17:21)
> > On Tue, Jul 14, 2020 at 6:26 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > >
> > > Quoting Bas Nieuwenhuizen (2020-07-14 16:41:02)
> > > > Calltree:
> > > >   timeline_fence_release
> > > >   drm_sched_entity_wakeup
> > > >   dma_fence_signal_locked
> > > >   sync_timeline_signal
> > > >   sw_sync_ioctl
> > > >
> > > > Releasing the reference to the fence in the fence signal callback
> > > > seems reasonable to me, so this patch avoids the locking issue in
> > > > sw_sync.
> > > >
> > > > d3862e44daa7 ("dma-buf/sw-sync: Fix locking around sync_timeline lists")
> > > > fixed the recursive locking issue but caused an use-after-free. Later
> > > > d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
> > > > fixed the use-after-free but reintroduced the recursive locking issue.
> > > >
> > > > In this attempt we avoid the use-after-free still because the release
> > > > function still always locks, and outside of the locking region in the
> > > > signal function we have properly refcounted references.
> > > >
> > > > We furthermore also avoid the recurive lock by making sure that either:
> > > >
> > > > 1) We have a properly refcounted reference, preventing the signal from
> > > >    triggering the release function inside the locked region.
> > > > 2) The refcount was already zero, and hence nobody will be able to trigger
> > > >    the release function from the signal function.
> > > >
> > > > Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
> > > > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > > > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Gustavo Padovan <gustavo at padovan.org>
> > > > Cc: Christian König <christian.koenig at amd.com>
> > > > Cc: <stable at vger.kernel.org>
> > > > Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> > > > ---
> > > >  drivers/dma-buf/sw_sync.c | 28 ++++++++++++++++++++--------
> > > >  1 file changed, 20 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > > > index 348b3a9170fa..30a482f75d56 100644
> > > > --- a/drivers/dma-buf/sw_sync.c
> > > > +++ b/drivers/dma-buf/sw_sync.c
> > > > @@ -192,9 +192,12 @@ static const struct dma_fence_ops timeline_fence_ops = {
> > > >  static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> > > >  {
> > > >         struct sync_pt *pt, *next;
> > > > +       struct list_head ref_list;
> > > >
> > > >         trace_sync_timeline(obj);
> > > >
> > > > +       INIT_LIST_HEAD(&ref_list);
> > > > +
> > > >         spin_lock_irq(&obj->lock);
> > > >
> > > >         obj->value += inc;
> > > > @@ -206,18 +209,27 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> > > >                 list_del_init(&pt->link);
> > > >                 rb_erase(&pt->node, &obj->pt_tree);
> > > >
> > > > -               /*
> > > > -                * A signal callback may release the last reference to this
> > > > -                * fence, causing it to be freed. That operation has to be
> > > > -                * last to avoid a use after free inside this loop, and must
> > > > -                * be after we remove the fence from the timeline in order to
> > > > -                * prevent deadlocking on timeline->lock inside
> > > > -                * timeline_fence_release().
> > > > -                */
> > > > +               /* We need to take a reference to avoid a release during
> > > > +                * signalling (which can cause a recursive lock of obj->lock).
> > > > +                * If refcount was already zero, another thread is already taking
> > > > +                * care of destructing the fence, so the signal cannot release
> > > > +                * it again and we hence will not have the recursive lock. */
> > >
> > > /*
> > >  * Block commentary style:
> > >  * https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting
> > >  */
> > >
> > > > +               if (dma_fence_get_rcu(&pt->base))
> > > > +                       list_add_tail(&pt->link, &ref_list);
> > >
> > > Ok.
> > >
> > > > +
> > > >                 dma_fence_signal_locked(&pt->base);
> > > >         }
> > > >
> > > >         spin_unlock_irq(&obj->lock);
> > > > +
> > > > +       list_for_each_entry_safe(pt, next, &ref_list, link) {
> > > > +               /* This needs to be cleared before release, otherwise the
> > > > +                * timeline_fence_release function gets confused about also
> > > > +                * removing the fence from the pt_tree. */
> > > > +               list_del_init(&pt->link);
> > > > +
> > > > +               dma_fence_put(&pt->base);
> > > > +       }
> > >
> > > How serious is the problem of one fence callback freeing another pt?
> > >
> > > Following the pattern here
> > >
> > >         spin_lock(&obj->lock);
> > >         list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> > >                 if (!timeline_fence_signaled(&pt->base))
> > >                         break;
> > >
> > >                 if (!dma_fence_get_rcu(&pt->base))
> > >                         continue; /* too late! */
> > >
> > >                 rb_erase(&pt->node, &obj->pt_tree);
> > >                 list_move_tail(&pt->link, &ref_list);
> > >         }
> > >         spin_unlock(&obj->lock);
> > >
> > >         list_for_each_entry_safe(pt, next, &ref_list, link) {
> > >                 list_del_init(&pt->link);
> > >                 dma_fence_signal(&pt->base);
> > 
> > Question is what the scope should be. Using this method we only take a
> > reference and avoid releasing the fences we're going to signal.
> > However the lock is on the timeline and dma_fence_signal already takes
> > the lock, so if the signal ends up releasing any of the other fences
> > in the timeline (e.g. the fences that are later than the new value),
> > then we are still going to have the recursive locking issue.
> > 
> > One solution to that would be splitting the lock into 2 locks: 1
> > protecting pt_list + pt_tree and 1 protecting value & serving as the
> > lock for the fences. Then we can only take the list lock in the
> > release function and dma_fence_signal takes the value lock. However,
> > then you still have issues like if a callback for fence A (when called
> > it holds the lock of A's timeline) adds a callback to fence B of the
> > same timeline (requires the fence lock) we still have a recursion
> > locking issue.
> > 
> > I'm not sure it is reasonable for the last use case to work, because I
> > only see it working with per-fence locks, but per-fence locks are
> > deadlock prone due to the order of taking links not really being
> > fixed.
> 
> With the explicit reference you introduce for the signaler, is it not
> then safe to revert d3c6dd1fb30d i.e. conditionally take the spinlock
> inside release? We just have to remember to acquire the ref before
> removing the pt from the list.

No, still only thinking about recursing down the same set of syncpt.
-Chris


More information about the dri-devel mailing list