[PATCH] drm/i915/pcode: Wait 10 seconds for pcode to settle

Gwan-gyeong Mun gwan-gyeong.mun at intel.com
Fri Jan 27 06:50:26 UTC 2023



On 1/11/23 5:36 PM, Andi Shyti wrote:
> On Wed, Jan 11, 2023 at 03:18:38PM +0200, Jani Nikula wrote:
>> On Wed, 11 Jan 2023, Andi Shyti <andi.shyti at linux.intel.com> wrote:
>>> From: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
>>>
>>> During module load not all the punit transaction have completed
>>> and we might end up timing out, as shown by the following
>>> warning:
>>
>> Root cause?

Hi Andi, looking at the log where this problem is reported,

https://gitlab.freedesktop.org/drm/intel/-/issues/7814

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1294/bat-atsm-1/igt@i915_module_load@resize-bar.html#dmesg-warnings17324

<4> [268.276908] i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)

In the code above, the call stack is output, and the return value of 
intel_pcode_init() returns -11 (-EAGAIN).

<3> [268.329058] i915 0000:4d:00.0: [drm] *ERROR* gt0: intel_pcode_init 
failed -11


If you simplify the function call flow, you can see it as below. (The 
call of _wait_for(COND, timeout_base_ms * 1000, 10, 10) in 
skl_pcode_request() is omitted)

-------------------------------------------------------------------------
intel_pcode_init()
  |
  +-> skl_pcode_request(uncore, DG1_PCODE_STATUS,
                        DG1_UNCORE_GET_INIT_STATUS,
                        DG1_UNCORE_INIT_STATUS_COMPLETE,
                        DG1_UNCORE_INIT_STATUS_COMPLETE, 180000);
        |
        +-> skl_pcode_try_request()
              |
              +->  *status = __snb_pcode_rw(uncore, mbox, &request, NULL,
                                            500, 0, true);

-------------------------------------------------------------------------
static int __snb_pcode_rw(struct intel_uncore *uncore, u32 mbox,
		          u32 *val, u32 *val1,
			  int fast_timeout_us, int slow_timeout_ms,
			  bool is_read)
{
...

	if (intel_uncore_read_fw(uncore, GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
		return -EAGAIN;

	intel_uncore_write_fw(uncore, GEN6_PCODE_DATA, *val);
	intel_uncore_write_fw(uncore, GEN6_PCODE_DATA1, val1 ? *val1 : 0);
	intel_uncore_write_fw(uncore,
			      GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);

	if (__intel_wait_for_register_fw(uncore,
					 GEN6_PCODE_MAILBOX,
					 GEN6_PCODE_READY, 0,
					 fast_timeout_us,
					 slow_timeout_ms,
					 &mbox))
		return -ETIMEDOUT;

...
}
-------------------------------------------------------------------------

The case where skl_pcode_request() returns -EAGAIN can be considered as 
the following scenario.
1. 1. When checking the GEN6_PCODE_MAILBOX register status before 
writing the value to the GEN6_PCODE_DATA register in __snb_pcode_rw(), 
it is always BUSY
2. _wait_for(COND, timeout_base_ms * 1000, 10, 10) of 
skl_pcode_request() returns -EAGAIN if the GEN6_PCODE_MAILBOX register 
indicates BUSY even after waiting 500us after writing a value to the 
GEN6_PCODE_DATA register in __snb_pcode_rw()

(Even if skl_pcode_request() gives a timeout of 180 seconds, the time 
used each time __snb_pcode_rw() is called is up to 500us. The rest of 
the time is used for sleep.)

In the situation where the problem is reported, it is not possible to 
confirm exactly which scenario code causes the problem with the current 
log information, and it seems that additional analysis is needed to 
confirm it.
If the hardware takes more than 500us to succeed after PCODE_MAILBOX 
write under certain circumstances, it is thought that the root problem 
causing the problem should be fixed.

br,
G.G.

>>
>>>
>>>     i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)
>>>
>>> Wait 10 seconds for the punit to settle and complete any
>>> outstanding transactions upon module load.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7814
>>>
>>
>> No blank lines between the tag lines please.
> 
> I don't consider "Closes:" to be a tag even if someone is using
> it as such. AFAIK is not mentioned it in any of the kernel docs
> (e.g. Documentation/process/maintainer-tip.rst).
> 
> Not to confuse it with a normal tag, I wanted to put it in
> parenthesis or rephrase it.
> 
>>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty at intel.com>
>>> Co-developed-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>> Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/intel_pcode.c | 35 ++++++++++++++++++++++++++----
>>>   1 file changed, 31 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_pcode.c b/drivers/gpu/drm/i915/intel_pcode.c
>>> index a234d9b4ed14..3db2ba439bb5 100644
>>> --- a/drivers/gpu/drm/i915/intel_pcode.c
>>> +++ b/drivers/gpu/drm/i915/intel_pcode.c
>>> @@ -204,15 +204,42 @@ int skl_pcode_request(struct intel_uncore *uncore, u32 mbox, u32 request,
>>>   #undef COND
>>>   }
>>>   
>>> +static int pcode_init_wait(struct intel_uncore *uncore, int timeout_ms)
>>> +{
>>> +	if (__intel_wait_for_register_fw(uncore,
>>> +					 GEN6_PCODE_MAILBOX,
>>> +					 GEN6_PCODE_READY, 0,
>>> +					 500, timeout_ms,
>>> +					 NULL))
>>> +		return -EPROBE_DEFER;
>>
>> This is already done within skl_pcode_request -> skl_pcode_try_request
>> -> __snb_pcode_rw path, with waits in skl_pcode_request.
> 
> the idea is to check for PCODE_READY even before checking if
> data are sent/received by pcode. And this is only during boot
> time. While skl_pcode_request is called in other contexts as
> well.
> 
> In other words here I want to start the communication when I
> already know that the punit is ready. Otherwise I would hit an
> -EAGAIN and fail.
> 
>> Is there anyone who still understands what's being waited for, where,
>> for how long, and why in the different code paths? I know I don't, and
>> this isn't helping.
> 
> I think it depends on hardware. There are some documents roaming
> around with some boot time and reset time calculation.
> 
>> There's also no explanation on the -EPROBE_DEFER return in the commit
>> message or comments or anywhere.
> 
> we haven't really failed, right? We just need some time for the
> punit to be ready and try to probe again (remember we are here in
> module probe).
> 
> Thanks for the review,
> Andi
> 
>> Again, root cause?
>>
>> BR,
>> Jani.
>>
>>
>>> +
>>> +	return skl_pcode_request(uncore,
>>> +				 DG1_PCODE_STATUS,
>>> +				 DG1_UNCORE_GET_INIT_STATUS,
>>> +				 DG1_UNCORE_INIT_STATUS_COMPLETE,
>>> +				 DG1_UNCORE_INIT_STATUS_COMPLETE, timeout_ms);
>>> +}
>>> +
>>>   int intel_pcode_init(struct intel_uncore *uncore)
>>>   {
>>> +	int err;
>>> +
>>>   	if (!IS_DGFX(uncore->i915))
>>>   		return 0;
>>>   
>>> -	return skl_pcode_request(uncore, DG1_PCODE_STATUS,
>>> -				 DG1_UNCORE_GET_INIT_STATUS,
>>> -				 DG1_UNCORE_INIT_STATUS_COMPLETE,
>>> -				 DG1_UNCORE_INIT_STATUS_COMPLETE, 180000);
>>> +	/*
>>> +	 * Wait 10 seconds so that the punit to settle and complete
>>> +	 * any outstanding transactions upon module load
>>> +	 */
>>> +	err = pcode_init_wait(uncore, 10000);
>>> +
>>> +	if (err) {
>>> +		drm_notice(&uncore->i915->drm,
>>> +			   "Waiting for HW initialisation...\n");
>>> +		err = pcode_init_wait(uncore, 180000);
>>> +	}
>>> +
>>> +	return err;
>>>   }
>>>   
>>>   int snb_pcode_read_p(struct intel_uncore *uncore, u32 mbcmd, u32 p1, u32 p2, u32 *val)
>>
>> -- 
>> Jani Nikula, Intel Open Source Graphics Center


More information about the dri-devel mailing list