[PATCH v2] drm/xe: Log unreliable MMIO reads during forcewake

Lin, Shuicheng shuicheng.lin at intel.com
Thu Oct 17 16:14:45 UTC 2024


On Thu, Oct 17, 2024 at 08:31 AM Badal Nilawar <badal.nilawar at intel.com> wrote:
> On 17-10-2024 09:10, Shuicheng Lin wrote:
> > In some cases, when the driver attempts to read an MMIO register, the
> > hardware may return 0xFFFFFFFF. The current force wake path code
> > treats this as a valid response, as it only checks the BIT.
> > However, 0xFFFFFFFF should be considered an invalid value, indicating
> > a potential issue. To address this, we should add a log entry to
> > highlight this condition.
> >
> > v2 (Matt Brost):
> >    - set ret value (-EIO) to kick the error to upper layers
> >
> > Suggested-by: Alex Zuo <alex.zuo at intel.com>
> > Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
> > Cc: Matthew Brost <matthew.brost at intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> > Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> > Cc: Matt Roper <matthew.d.roper at intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_force_wake.c | 13 +++++++++----
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_force_wake.c
> > b/drivers/gpu/drm/xe/xe_force_wake.c
> > index a64c14757c84..49ceb16e82e4 100644
> > --- a/drivers/gpu/drm/xe/xe_force_wake.c
> > +++ b/drivers/gpu/drm/xe/xe_force_wake.c
> > @@ -115,10 +115,15 @@ static int __domain_wait(struct xe_gt *gt, struct
> xe_force_wake_domain *domain,
> >   			     XE_FORCE_WAKE_ACK_TIMEOUT_MS *
> USEC_PER_MSEC,
> >   			     &value, true);
> >   	if (ret)
> > -		xe_gt_notice(gt, "Force wake domain %d failed to ack %s
> (%pe) reg[%#x] = %#x\n",
> > -			     domain->id, str_wake_sleep(wake), ERR_PTR(ret),
> > -			     domain->reg_ack.addr, value);
> > -
> > +		xe_gt_err(gt, "Force wake domain %d failed to ack %s (%pe)
> reg[%#x] = %#x\n",
> > +			  domain->id, str_wake_sleep(wake), ERR_PTR(ret),
> > +			  domain->reg_ack.addr, value);
> > +	if (value == ~0) {
> > +		xe_gt_err(gt,
> > +			  "Force wake domain %d: %s. MMIO unreliable
> (forcewake register returns 0xFFFFFFFF)!\n",
> > +			  domain->id, str_wake_sleep(wake));
> 
> Register value return 0xFFFFFFFF means PCIe accesses are unreliable, timed
> out and device is in bad state. As commented in rev1, accesses are recovering
> after some time, wondering how?
> 
> Acked-by: Badal Nilawar <badal.nilawar at intel.com>

For my test cases, the time out is due to heavy pcie traffic and device could be recovered automatically later.
It seems device is not in bad state, or there is auto reset function inside the device to recover itself.

Shuicheng
> 
> Regards,
> Badal
> 
> > +		ret = -EIO;
> > +	}
> >   	return ret;
> >   }
> >



More information about the Intel-xe mailing list