[PATCH 1/1] drm/xe/xe_sync: avoid race during ufence signaling
Matthew Brost
matthew.brost at intel.com
Wed Aug 20 00:40:04 UTC 2025
On Tue, Aug 19, 2025 at 05:35:07PM -0700, Matthew Brost wrote:
> On Tue, Aug 19, 2025 at 08:44:04PM +0200, Zbigniew Kempczyński wrote:
> > During vm-bind ioctl ops execute fence may be signaled during the call.
> > If vm-bind syncs to user-fence it creates a race because signaling
> > happens in the worker. This means control may return from vm-bind
> > ioctl and consecutive vm-bind operation to same vma (unmap) may happen
> > on still not signaled user-fence. This finally ends with -EBUSY error
> > because even if vma operations completed fence still exists but
> > userspace was unblocked with copy_to_user() call.
> >
> > Instead of releasing user-fences in workqueue for already signaled
> > ops put them synchronously in the same vm-bind ioctl call.
> >
>
> I'm not really following this explaination. I think the actual problem
> is in user_fence_worker() the copy to user to done before
> WRITE_ONCE(ufence->signalled, 1). If there were re-ordered, I think
> that would fix this problem.
>
Also, as a follow-on optimization, you could keep it as you have here,
but this patch has the same ordering issue. That should be fixed for
consistency, even though it would still be functional because vm->lock
protects against the misordering.
Matt
> Matt
>
> > Fixes: 977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry")
> > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5536
> > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Matthew Auld <matthew.auld at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_sync.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> > index f87276df18f2..8becc3755649 100644
> > --- a/drivers/gpu/drm/xe/xe_sync.c
> > +++ b/drivers/gpu/drm/xe/xe_sync.c
> > @@ -103,6 +103,15 @@ static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
> > dma_fence_put(fence);
> > }
> >
> > +static void kick_ufence_sync(struct xe_user_fence *ufence, struct dma_fence *fence)
> > +{
> > + if (copy_to_user(ufence->addr, &ufence->value, sizeof(ufence->value)))
> > + XE_WARN_ON("Copy to user failed");
> > + WRITE_ONCE(ufence->signalled, 1);
> > + user_fence_put(ufence);
> > + dma_fence_put(fence);
> > +}
> > +
> > static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
> > {
> > struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
> > @@ -244,7 +253,7 @@ void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence)
> > err = dma_fence_add_callback(fence, &sync->ufence->cb,
> > user_fence_cb);
> > if (err == -ENOENT) {
> > - kick_ufence(sync->ufence, fence);
> > + kick_ufence_sync(sync->ufence, fence);
> > } else if (err) {
> > XE_WARN_ON("failed to add user fence");
> > user_fence_put(sync->ufence);
> > --
> > 2.43.0
> >
More information about the Intel-xe
mailing list