[PATCH] lib/xe/xe_spin: Add additional MI_ARB_CHECK to preemptable spin init

Cavitt, Jonathan jonathan.cavitt at intel.com
Thu Aug 14 19:43:01 UTC 2025


-----Original Message-----
From: Brost, Matthew <matthew.brost at intel.com> 
Sent: Thursday, August 14, 2025 12:24 PM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>
Cc: igt-dev at lists.freedesktop.org; Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Kempczynski, Zbigniew <zbigniew.kempczynski at intel.com>; Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>
Subject: Re: [PATCH] lib/xe/xe_spin: Add additional MI_ARB_CHECK to preemptable spin init
> 
> On Thu, Aug 14, 2025 at 01:11:39PM -0600, Cavitt, Jonathan wrote:
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost at intel.com> 
> > Sent: Thursday, August 14, 2025 12:05 PM
> > To: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> > Cc: igt-dev at lists.freedesktop.org; Gupta, saurabhg <saurabhg.gupta at intel.com>; Zuo, Alex <alex.zuo at intel.com>; Kempczynski, Zbigniew <zbigniew.kempczynski at intel.com>; Vishwanathapura, Niranjana <niranjana.vishwanathapura at intel.com>
> > Subject: Re: [PATCH] lib/xe/xe_spin: Add additional MI_ARB_CHECK to preemptable spin init
> > > 
> > > On Thu, Aug 14, 2025 at 06:50:10PM +0000, Jonathan Cavitt wrote:
> > > > Add an additional MI_ARB_CHECK to the spin initialization function,
> > > > just before the conditional batch buffer end, when preemption is
> > > > enabled.  The MI_ARB_CHECK forces a preemption in this case, which can
> > > > allow additionally scheduled exec queues to make independent progress
> > > > earlier than if we relied on the automatic timeout of the exec queue.
> > > > 
> > > > Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5783
> > > > Suggested-by: Zbigniew Kempczyński <zbigniew.kempczynski at intel.com>
> > > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
> > > > ---
> > > >  lib/xe/xe_spin.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/lib/xe/xe_spin.c b/lib/xe/xe_spin.c
> > > > index 25f7555e27..b3a937fe80 100644
> > > > --- a/lib/xe/xe_spin.c
> > > > +++ b/lib/xe/xe_spin.c
> > > > @@ -30,7 +30,6 @@
> > > >  
> > > >  enum { START_TS, NOW_TS };
> > > >  
> > > > -
> > > >  uint32_t xe_spin_nsec_to_ticks(int fd, int gt_id, uint64_t nsec)
> > > >  {
> > > >  	uint32_t ticks = xe_nsec_to_ticks(fd, gt_id, nsec);
> > > > @@ -167,6 +166,9 @@ void xe_spin_init(struct xe_spin *spin, struct xe_spin_opts *opts)
> > > >  				   opts->mem_copy->dst->mocs_index;
> > > >  	}
> > > >  
> > > > +	if (opts->preempt)
> > > > +		spin->batch[b++] = MI_ARB_CHECK;
> > > 
> > > But we already have one these at the top of the loop:
> > > 
> > >  72         loop_addr = opts->addr + b * sizeof(uint32_t);
> > >  73
> > >  74         spin->batch[b++] = MI_STORE_DWORD_IMM_GEN4;
> > >  75         spin->batch[b++] = start_addr;
> > >  76         spin->batch[b++] = start_addr >> 32;
> > >  77         spin->batch[b++] = 0xc0ffee;
> > >  78
> > >  79         if (opts->preempt)
> > >  80                 spin->batch[b++] = MI_ARB_CHECK;
> > > 
> > > Line 74 is here the ring loop restarts.
> > 
> > I tried removing the top MI_ARB_CHECK, but that caused the test to start
> > failing again, so it looks like both MI_ARB_CHECK calls are necessary.
> > -Jonathan Cavitt
> 
> It shouldn’t be, though. A couple of instructions later we should jump
> to loop_addr (that is line 73 ring instruction) and then execute the
> MI_ARB_CHECK programmed in the ring at line 80.
> 
> I’m all for fixing the tests, but I want to understand what’s going on
> here.
> 
> Can you reliably reproduce this issue and confirm that it fails before
> this change and works after?

Yes.

Before the change, the issue occurs with mixed frequency (about 10%), and many
runs take around a half-second to complete.  After the change, the test consistently
passes and takes around 10 ms to complete.

The primary cause of the issue seems to be an engine reset that occurs due to
an exec-queue hang.  The additional MI_ARB_CHECK seems to force the second
scheduled exec-queue to make forward progress, which unblocks the first and
prevents the hang.

Perhaps the "real" issue that the MI_BATCH_BUFFER_START loopback causes the
execution call stack to fill up, resulting in the system having to clear up multiple
layers of execution once the second exec queue completes?  I don't think that's
particularly solvable, though.
-Jonathan Cavitt

> 
> Matt
> 
> > 
> > > 
> > > Matt
> > > 
> > > > +
> > > >  	spin->batch[b++] = MI_COND_BATCH_BUFFER_END | MI_DO_COMPARE | 2;
> > > >  	spin->batch[b++] = 0;
> > > >  	spin->batch[b++] = end_addr;
> > > > -- 
> > > > 2.43.0
> > > > 
> > > 
> 


More information about the igt-dev mailing list