[PATCH v2] drm/xe: Log unreliable MMIO reads during forcewake
Nilawar, Badal
badal.nilawar at intel.com
Thu Oct 17 15:30:45 UTC 2024
On 17-10-2024 09:10, Shuicheng Lin wrote:
> In some cases, when the driver attempts to read an MMIO register,
> the hardware may return 0xFFFFFFFF. The current force wake path
> code treats this as a valid response, as it only checks the BIT.
> However, 0xFFFFFFFF should be considered an invalid value, indicating
> a potential issue. To address this, we should add a log entry to
> highlight this condition.
>
> v2 (Matt Brost):
> - set ret value (-EIO) to kick the error to upper layers
>
> Suggested-by: Alex Zuo <alex.zuo at intel.com>
> Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> ---
> drivers/gpu/drm/xe/xe_force_wake.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
> index a64c14757c84..49ceb16e82e4 100644
> --- a/drivers/gpu/drm/xe/xe_force_wake.c
> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
> @@ -115,10 +115,15 @@ static int __domain_wait(struct xe_gt *gt, struct xe_force_wake_domain *domain,
> XE_FORCE_WAKE_ACK_TIMEOUT_MS * USEC_PER_MSEC,
> &value, true);
> if (ret)
> - xe_gt_notice(gt, "Force wake domain %d failed to ack %s (%pe) reg[%#x] = %#x\n",
> - domain->id, str_wake_sleep(wake), ERR_PTR(ret),
> - domain->reg_ack.addr, value);
> -
> + xe_gt_err(gt, "Force wake domain %d failed to ack %s (%pe) reg[%#x] = %#x\n",
> + domain->id, str_wake_sleep(wake), ERR_PTR(ret),
> + domain->reg_ack.addr, value);
> + if (value == ~0) {
> + xe_gt_err(gt,
> + "Force wake domain %d: %s. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!\n",
> + domain->id, str_wake_sleep(wake));
Register value return 0xFFFFFFFF means PCIe accesses are unreliable,
timed out and device is in bad state. As commented in rev1, accesses are
recovering after some time, wondering how?
Acked-by: Badal Nilawar <badal.nilawar at intel.com>
Regards,
Badal
> + ret = -EIO;
> + }
> return ret;
> }
>
More information about the Intel-xe
mailing list