[PATCH 1/1] drm/xe/xe_sync: avoid race during ufence signaling
Matthew Brost
matthew.brost at intel.com
Fri Aug 22 03:53:28 UTC 2025
On Wed, Aug 20, 2025 at 10:38:14AM +0200, Zbigniew Kempczyński wrote:
> On Tue, Aug 19, 2025 at 05:40:04PM -0700, Matthew Brost wrote:
> > On Tue, Aug 19, 2025 at 05:35:07PM -0700, Matthew Brost wrote:
> > > On Tue, Aug 19, 2025 at 08:44:04PM +0200, Zbigniew Kempczyński wrote:
> > > > During vm-bind ioctl ops execute fence may be signaled during the call.
> > > > If vm-bind syncs to user-fence it creates a race because signaling
> > > > happens in the worker. This means control may return from vm-bind
> > > > ioctl and consecutive vm-bind operation to same vma (unmap) may happen
> > > > on still not signaled user-fence. This finally ends with -EBUSY error
> > > > because even if vma operations completed fence still exists but
> > > > userspace was unblocked with copy_to_user() call.
> > > >
> > > > Instead of releasing user-fences in workqueue for already signaled
> > > > ops put them synchronously in the same vm-bind ioctl call.
> > > >
> > >
> > > I'm not really following this explaination. I think the actual problem
> > > is in user_fence_worker() the copy to user to done before
> > > WRITE_ONCE(ufence->signalled, 1). If there were re-ordered, I think
> > > that would fix this problem.
> > >
> >
> > Also, as a follow-on optimization, you could keep it as you have here,
> > but this patch has the same ordering issue. That should be fixed for
> > consistency, even though it would still be functional because vm->lock
> > protects against the misordering.
>
> Unfortunately I can't use this, lockdep complained I've violated locking
> rules. To properly handle this I likely could collect all user-fences to
Ah, yes I think copy_to_user can take the mmap lock which is not allowed
under dma-resv.
> signal outside of drm_exec loop in vm_bind_ioctl_ops_execute(). But I
> don't know should I complicate the code for this particular case.
>
I think this is possible - xe_sync_entry_signal doesn't need to be
called under the dma-resv lock so we could do this later in sw pipeline.
Might be worth while in an effort to signal user fences immediately when
possible. We can look at this in a follow up.
Matt
> Thanks for the review, I've sent simple reorder patch which should fix
> the 5536 issue.
>
> --
> Zbigniew
>
> >
> > Matt
> >
> > > Matt
> > >
> > > > Fixes: 977e5b82e090 ("drm/xe: Expose user fence from xe_sync_entry")
> > > > Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5536
> > > > Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > Cc: Matthew Auld <matthew.auld at intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_sync.c | 11 ++++++++++-
> > > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> > > > index f87276df18f2..8becc3755649 100644
> > > > --- a/drivers/gpu/drm/xe/xe_sync.c
> > > > +++ b/drivers/gpu/drm/xe/xe_sync.c
> > > > @@ -103,6 +103,15 @@ static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
> > > > dma_fence_put(fence);
> > > > }
> > > >
> > > > +static void kick_ufence_sync(struct xe_user_fence *ufence, struct dma_fence *fence)
> > > > +{
> > > > + if (copy_to_user(ufence->addr, &ufence->value, sizeof(ufence->value)))
> > > > + XE_WARN_ON("Copy to user failed");
> > > > + WRITE_ONCE(ufence->signalled, 1);
> > > > + user_fence_put(ufence);
> > > > + dma_fence_put(fence);
> > > > +}
> > > > +
> > > > static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
> > > > {
> > > > struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
> > > > @@ -244,7 +253,7 @@ void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence)
> > > > err = dma_fence_add_callback(fence, &sync->ufence->cb,
> > > > user_fence_cb);
> > > > if (err == -ENOENT) {
> > > > - kick_ufence(sync->ufence, fence);
> > > > + kick_ufence_sync(sync->ufence, fence);
> > > > } else if (err) {
> > > > XE_WARN_ON("failed to add user fence");
> > > > user_fence_put(sync->ufence);
> > > > --
> > > > 2.43.0
> > > >
More information about the Intel-xe
mailing list