[PATCH] nouveau: fix client work fence deletion race

Karol Herbst kherbst at redhat.com
Thu Jun 15 10:45:30 UTC 2023


On Thu, Jun 15, 2023 at 4:47 AM Dave Airlie <airlied at gmail.com> wrote:
>
> From: Dave Airlie <airlied at redhat.com>
>
> This seems to have existed for ever but is now more apparant after
> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers)
>
> My analysis:
> two threads are running,
> one in the irq signalling the fence, in dma_fence_signal_timestamp_locked,
> it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks.
>
> second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the
> fence, cleanups the object and frees the work item, which contains the callback.
>
> thread one goes again and tries to call the callback and causes the use-after-free.
>
> Proposed fix:
> lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done
> or the memory is freed.
>
> Signed-off-by: Dave Airlie <airlied at redhat.com>

Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from
deferred client work handler")

Is I think the most reasonable commit to tag to ensure backports.

> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index cc7c5b4a05fd..1a45be769848 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
>  static inline bool
>  nouveau_cli_work_ready(struct dma_fence *fence)
>  {
> -       if (!dma_fence_is_signaled(fence))
> -               return false;
> -       dma_fence_put(fence);
> -       return true;
> +       unsigned long flags;
> +       bool ret = true;
> +       spin_lock_irqsave(fence->lock, flags);
> +       if (!dma_fence_is_signaled_locked(fence))
> +               ret = false;
> +       spin_unlock_irqrestore(fence->lock, flags);

I agree with the code being always broken as
`dma_fence_is_signaled_locked` specifies: "This function requires
&dma_fence.lock to be held."

> +
> +       if (ret == true)
> +               dma_fence_put(fence);
> +       return ret;
>  }
>
>  static void
> --
> 2.40.1
>

regardless of the patch fixing the issue users have seen:

Reviewed-by: Karol Herbst <kherbst at redhat.com>



More information about the dri-devel mailing list