[PATCH] nouveau: fix client work fence deletion race
Karol Herbst
kherbst at redhat.com
Thu Jun 15 10:45:30 UTC 2023
On Thu, Jun 15, 2023 at 4:47 AM Dave Airlie <airlied at gmail.com> wrote:
>
> From: Dave Airlie <airlied at redhat.com>
>
> This seems to have existed for ever but is now more apparant after
> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers)
>
> My analysis:
> two threads are running,
> one in the irq signalling the fence, in dma_fence_signal_timestamp_locked,
> it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks.
>
> second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the
> fence, cleanups the object and frees the work item, which contains the callback.
>
> thread one goes again and tries to call the callback and causes the use-after-free.
>
> Proposed fix:
> lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done
> or the memory is freed.
>
> Signed-off-by: Dave Airlie <airlied at redhat.com>
Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from
deferred client work handler")
Is I think the most reasonable commit to tag to ensure backports.
> ---
> drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index cc7c5b4a05fd..1a45be769848 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev)
> static inline bool
> nouveau_cli_work_ready(struct dma_fence *fence)
> {
> - if (!dma_fence_is_signaled(fence))
> - return false;
> - dma_fence_put(fence);
> - return true;
> + unsigned long flags;
> + bool ret = true;
> + spin_lock_irqsave(fence->lock, flags);
> + if (!dma_fence_is_signaled_locked(fence))
> + ret = false;
> + spin_unlock_irqrestore(fence->lock, flags);
I agree with the code being always broken as
`dma_fence_is_signaled_locked` specifies: "This function requires
&dma_fence.lock to be held."
> +
> + if (ret == true)
> + dma_fence_put(fence);
> + return ret;
> }
>
> static void
> --
> 2.40.1
>
regardless of the patch fixing the issue users have seen:
Reviewed-by: Karol Herbst <kherbst at redhat.com>
More information about the dri-devel
mailing list