[Intel-gfx] [PATCH] drm/i915/pcode: Wait 10 seconds for pcode to settle

Andi Shyti andi.shyti at linux.intel.com
Mon Jan 16 19:27:20 UTC 2023


Hi Rodrigo,

On Wed, Jan 11, 2023 at 12:06:24PM -0500, Rodrigo Vivi wrote:
> On Wed, Jan 11, 2023 at 04:39:36PM +0100, Andi Shyti wrote:
> > Hi Rodrigo,
> > 
> > On Wed, Jan 11, 2023 at 10:25:56AM -0500, Rodrigo Vivi wrote:
> > > On Wed, Jan 11, 2023 at 11:44:47AM +0100, Andi Shyti wrote:
> > > > From: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
> > > > 
> > > > During module load not all the punit transaction have completed
> > > > and we might end up timing out, as shown by the following
> > > > warning:
> > > > 
> > > >    i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)
> > > > 
> > > > Wait 10 seconds for the punit to settle and complete any
> > > > outstanding transactions upon module load.
> > > 
> > > 10 *SECONDS* ?!
> > 
> > Don't be alarmed :)
> > 
> > It's up to 10 seconds, otherwise we would end up waiting up to 3
> > minutes.
> > 
> > And I've seen a version (and you did as well) where those 3
> > minutes were raised to 6 (for the PVC particular case).
> 
> Oh yeap! and that case is funny! Because the indication from PCODE
> is that 10 seconds is enough, but there are some rare cases where
> it gets stuck and end up taking a very long time. Then they multiplied
> the bad rare case to 3, and no idea why how that become 6.
> 
> But anyway, thanks for refreshing my memory. When I first noticed this
> patch I thought it was in all the platforms, where this wouldn't make
> sense. But on discrete where the pcode needs to bring the memory and
> gt up before we can really use it, it makes sense.
> 
> And then I noticed that your patch is indeed for dgfx only, so it
> would be okay. And 10 seconds is okay.

yes... :)

Thanks for the recap!

> However I also noticed that you do this before the other pcode_init
> check that we were told by pcode folks to use. So, I don't understand
> how your patch is helping now... you wait for 10 seconds and then you
> will wait more 10 seconds on the pcode_ready... why the pcode_ready
> check that we have in case already doesn't cover yours?

The difference is that the first timeout ensures that the punit
is ready during boot time. The need to wait is only in boot,
as we check if it's ready and then start communicating.

The second 10s wait comes after the read/write has actually
happened. I expect in this case to wait just a little because
it's just a test write done in boot to make sure the punit is
really responding and ready for the next writes.

But because skl_pcode_request() is used in other contexts, as
well, I assume that the punit is ready and I don't need to wait
anymore before communicating with it. Thus I wait only after
sending the command: I wait for it to complete and be ready again
for the next command.

In other words each wait makes sure that the punit is ready for
the next command. That's why we need a first 10 seconds wait as
at the very first write we weren't sure 100% the punit was ready.

> And why that return?

Becase the driver is not really failing. Most probably the
hardware is completing the boot routines thus we need to try
probing again later.

I hope I was able to explain myself.

Thanks for the comments and for looking into this,
Andi

> > 
> > Thanks for checking this,
> > Andi
> > 
> > > > 
> > > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7814
> > > > 
> > > > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
> > > > Co-developed-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_pcode.c | 35 ++++++++++++++++++++++++++----
> > > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_pcode.c b/drivers/gpu/drm/i915/intel_pcode.c
> > > > index a234d9b4ed14..3db2ba439bb5 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pcode.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pcode.c
> > > > @@ -204,15 +204,42 @@ int skl_pcode_request(struct intel_uncore *uncore, u32 mbox, u32 request,
> > > >  #undef COND
> > > >  }
> > > >  
> > > > +static int pcode_init_wait(struct intel_uncore *uncore, int timeout_ms)
> > > > +{
> > > > +	if (__intel_wait_for_register_fw(uncore,
> > > > +					 GEN6_PCODE_MAILBOX,
> > > > +					 GEN6_PCODE_READY, 0,
> > > > +					 500, timeout_ms,
> > > > +					 NULL))
> > > > +		return -EPROBE_DEFER;
> > > > +
> > > > +	return skl_pcode_request(uncore,
> > > > +				 DG1_PCODE_STATUS,
> > > > +				 DG1_UNCORE_GET_INIT_STATUS,
> > > > +				 DG1_UNCORE_INIT_STATUS_COMPLETE,
> > > > +				 DG1_UNCORE_INIT_STATUS_COMPLETE, timeout_ms);
> > > > +}
> > > > +
> > > >  int intel_pcode_init(struct intel_uncore *uncore)
> > > >  {
> > > > +	int err;
> > > > +
> > > >  	if (!IS_DGFX(uncore->i915))
> > > >  		return 0;
> > > >  
> > > > -	return skl_pcode_request(uncore, DG1_PCODE_STATUS,
> > > > -				 DG1_UNCORE_GET_INIT_STATUS,
> > > > -				 DG1_UNCORE_INIT_STATUS_COMPLETE,
> > > > -				 DG1_UNCORE_INIT_STATUS_COMPLETE, 180000);
> > > > +	/*
> > > > +	 * Wait 10 seconds so that the punit to settle and complete
> > > > +	 * any outstanding transactions upon module load
> > > > +	 */
> > > > +	err = pcode_init_wait(uncore, 10000);
> > > > +
> > > > +	if (err) {
> > > > +		drm_notice(&uncore->i915->drm,
> > > > +			   "Waiting for HW initialisation...\n");
> > > > +		err = pcode_init_wait(uncore, 180000);
> > > > +	}
> > > > +
> > > > +	return err;
> > > >  }
> > > >  
> > > >  int snb_pcode_read_p(struct intel_uncore *uncore, u32 mbcmd, u32 p1, u32 p2, u32 *val)
> > > > -- 
> > > > 2.34.1
> > > > 


More information about the Intel-gfx mailing list