[PATCH v1 1/1] drm/i915/gt: Increase a time to retry RING_HEAD reset

Thu Dec 5 16:26:31 UTC 2024

Hi Nitin,

On Thu, Dec 05, 2024 at 04:03:38PM +0000, Gote, Nitin R wrote:
> > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote:
> > > Issue is seen again where engine resets fails because the engine
> > > resumes from an incorrect RING_HEAD. So, increase a time if at first
> > > the write doesn't succeed and retry again.
> > >
> > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get
> > > sticks")
> > 
> > Is this a real fix or is it more of a fine tuning?
> 
> Here we can say this for more fine tuning as issue seen again and 
> that's why I have added fixes : 6ef0e3ef2662.

I thinkt he Fixes: tag is not necessary, here. You are not really
fixing a bug.

> > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806
> > > Signed-off-by: Nitin Gote <nitin.r.gote at intel.com>
> > 
> > ...
> > 
> > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
> > >  	set_pp_dir(engine);
> > >
> > >  	/* First wake the ring up to an empty/idle ring */
> > > -	for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC);
> > > +	for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC);
> > 
> > Where is this 50 coming from?
> 
> Well, here HEAD is still not 0 even after writing in it. 
> So, it could be the timing issue. I discussed this with Chris and we thought
> It is better to add 50ms instead of 2ms delay here to let HEAD write complete.
> I tested this on trybot for Haswell/Ivybridge platform https://patchwork.freedesktop.org/series/141779/ and 
> I see BAT is successful and shards issues are not related.

Can you please add a comment explaining why you are using 50ms?
You can also mention that you experimented with different values
and determined that 50ms works best based on testing.

Andi