[PATCH 1/1] drm/xe/xe_sync: avoid race during ufence signaling
Zbigniew Kempczyński
zbigniew.kempczynski at intel.com
Wed Aug 20 08:38:14 UTC 2025
On Tue, Aug 19, 2025 at 05:40:04PM -0700, Matthew Brost wrote:
> On Tue, Aug 19, 2025 at 05:35:07PM -0700, Matthew Brost wrote:
> > On Tue, Aug 19, 2025 at 08:44:04PM +0200, Zbigniew Kempczyński wrote:
> > > During vm-bind ioctl ops execute fence may be signaled during the call.
> > > If vm-bind syncs to user-fence it creates a race because signaling
> > > happens in the worker. This means control may return from vm-bind
> > > ioctl and consecutive vm-bind operation to same vma (unmap) may happen
> > > on still not signaled user-fence. This finally ends with -EBUSY error
> > > because even if vma operations completed fence still exists but
> > > userspace was unblocked with copy_to_user() call.
> > >
> > > Instead of releasing user-fences in workqueue for already signaled
> > > ops put them synchronously in the same vm-bind ioctl call.
> > >
> >
> > I'm not really following this explaination. I think the actual problem
> > is in user_fence_worker() the copy to user to done before
> > WRITE_ONCE(ufence->signalled, 1). If there were re-ordered, I think
> > that would fix this problem.
> >
>
> Also, as a follow-on optimization, you could keep it as you have here,
> but this patch has the same ordering issue. That should be fixed for
> consistency, even though it would still be functional because vm->lock
> protects against the misordering.
Unfortunately I can't use this, lockdep complained I've violated locking
rules. To properly handle this I likely could collect all user-fences to
signal outside of drm_exec loop in vm_bind_ioctl_ops_execute(). But I
don't know should I complicate the code for this particular case.
Thanks for the review, I've sent simple reorder patch which should fix
the 5536 issue.
--
Zbigniew
>
> Matt
>
> > Matt
> >
> > > Fixes: 977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry")
> > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5536
> > > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Cc: Matthew Auld <matthew.auld at intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_sync.c | 11 ++++++++++-
> > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> > > index f87276df18f2..8becc3755649 100644
> > > --- a/drivers/gpu/drm/xe/xe_sync.c
> > > +++ b/drivers/gpu/drm/xe/xe_sync.c
> > > @@ -103,6 +103,15 @@ static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
> > > dma_fence_put(fence);
> > > }
> > >
> > > +static void kick_ufence_sync(struct xe_user_fence *ufence, struct dma_fence *fence)
> > > +{
> > > + if (copy_to_user(ufence->addr, &ufence->value, sizeof(ufence->value)))
> > > + XE_WARN_ON("Copy to user failed");
> > > + WRITE_ONCE(ufence->signalled, 1);
> > > + user_fence_put(ufence);
> > > + dma_fence_put(fence);
> > > +}
> > > +
> > > static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
> > > {
> > > struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
> > > @@ -244,7 +253,7 @@ void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence)
> > > err = dma_fence_add_callback(fence, &sync->ufence->cb,
> > > user_fence_cb);
> > > if (err == -ENOENT) {
> > > - kick_ufence(sync->ufence, fence);
> > > + kick_ufence_sync(sync->ufence, fence);
> > > } else if (err) {
> > > XE_WARN_ON("failed to add user fence");
> > > user_fence_put(sync->ufence);
> > > --
> > > 2.43.0
> > >
More information about the Intel-xe
mailing list