[PATCH] drm/xe: Enlarge the invalidation timeout from 150 to 500

Lin, Shuicheng shuicheng.lin at intel.com
Wed Oct 9 19:40:33 UTC 2024


> -----Original Message-----
> From: Cavitt, Jonathan <jonathan.cavitt at intel.com>
> Sent: Wednesday, October 9, 2024 11:25 AM
> To: Lin, Shuicheng <shuicheng.lin at intel.com>; intel-xe at lists.freedesktop.org;
> Auld, Matthew <matthew.auld at intel.com>; Das, Nirmoy
> <nirmoy.das at intel.com>
> Cc: Lin, Shuicheng <shuicheng.lin at intel.com>; Cavitt, Jonathan
> <jonathan.cavitt at intel.com>
> Subject: RE: [PATCH] drm/xe: Enlarge the invalidation timeout from 150 to 500
> 
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Shuicheng
> Lin
> Sent: Wednesday, October 9, 2024 10:22 AM
> To: intel-xe at lists.freedesktop.org; Auld, Matthew <matthew.auld at intel.com>;
> Das, Nirmoy <nirmoy.das at intel.com>
> Cc: Lin, Shuicheng <shuicheng.lin at intel.com>
> Subject: [PATCH] drm/xe: Enlarge the invalidation timeout from 150 to 500
> >
> > There is error message like below during stress test.
> > "[   31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation
> timeout"
> > And change the timeout value from 150 to 500, could help avoid this
> > error message in the stress test.
> > xe_mmio_wait32() is implemented as wait 10us at beginning, then double
> > the wait value as next wait until the timeout value is reached. So for
> > the normal case, the real wait time is not changed.
> > The larger timeout value should take effect for the bad case only.
> >
> > Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
> 
> This looks good, though maybe the commit message could use some work.
> Something like:
> 
> """
> There are error messages like below that are occurring during stress testing:
> "[   31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout"
> Changing the timeout value from 150 to 500 could help avoid these error
> messages in the stress tests.
> 
> Due to the way xe_mmio_wait32 is implemented, the timeout is able to expire
> early when the register matches the expected value due to the wait increments
> starting small.  So, the larger timeout value should have no effect during normal
> use cases.
> """
> 
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com> -Jonathan Cavitt

Thanks Jonathan. Yes, I like your message. It is much better. Let me update the message with it.

> 
> > ---
> >  drivers/gpu/drm/xe/xe_device.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_device.c
> > b/drivers/gpu/drm/xe/xe_device.c index cd241a8e1838..60aebf7561ec
> > 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -925,7 +925,7 @@ void xe_device_l2_flush(struct xe_device *xe)
> >  	spin_lock(&gt->global_invl_lock);
> >  	xe_mmio_write32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1);
> >
> > -	if (xe_mmio_wait32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 150,
> NULL, true))
> > +	if (xe_mmio_wait32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 500,
> NULL,
> > +true))
> >  		xe_gt_err_once(gt, "Global invalidation timeout\n");
> >  	spin_unlock(&gt->global_invl_lock);
> >
> > --
> > 2.25.1
> >
> >


More information about the Intel-xe mailing list