[Intel-gfx] [PATCH 2/2] drm/i915: Try to detect sudden loss of MMIO access

Lucas De Marchi lucas.demarchi at intel.com
Fri Feb 12 21:59:42 UTC 2021


On Fri, Feb 12, 2021 at 01:19:25PM -0800, Matt Roper wrote:
>In rare circumstances bugs in PCI programming, broken BIOS, or failing
>hardware can cause the CPU to lose access to the MMIO BAR on dgfx
>platforms.  This is a pretty catastrophic failure since all register
>reads come back with values of 0xFFFFFFFF.  Let's check for this special
>case while doing our usual checks for unclaimed registers; the FPGA_DBG
>register we use for those checks on modern platforms has some unused
>bits that will always read back as 0 when things are behaving properly;
>we can use them as canaries to detect when MMIO itself has suddenly
>broken and try to print a more informative error message in the logs.
>
>v2: Let the detection function still return 'true' if we've lost our
>    MMIO access.  We'll still get an extra false positive message about
>    an unclaimed register access, but we'll still honor the 'mmio_debug'
>    limit and not spam the log.  (Lucas)
>
>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>


Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>

Lucas De Marchi

>---
> drivers/gpu/drm/i915/intel_uncore.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>index 5098f95d71b0..661b50191f2b 100644
>--- a/drivers/gpu/drm/i915/intel_uncore.c
>+++ b/drivers/gpu/drm/i915/intel_uncore.c
>@@ -465,6 +465,22 @@ fpga_check_for_unclaimed_mmio(struct intel_uncore *uncore)
> 	if (likely(!(dbg & FPGA_DBG_RM_NOCLAIM)))
> 		return false;
>
>+	/*
>+	 * Bugs in PCI programming (or failing hardware) can occasionally cause
>+	 * us to lose access to the MMIO BAR.  When this happens, register
>+	 * reads will come back with 0xFFFFFFFF for every register and things
>+	 * go bad very quickly.  Let's try to detect that special case and at
>+	 * least try to print a more informative message about what has
>+	 * happened.
>+	 *
>+	 * During normal operation the FPGA_DBG register has several unused
>+	 * bits that will always read back as 0's so we can use them as canaries
>+	 * to recognize when MMIO accesses are just busted.
>+	 */
>+	if (unlikely(dbg == ~0))
>+		drm_err(&uncore->i915->drm,
>+			"Lost access to MMIO BAR; all registers now read back as 0xFFFFFFFF!\n");
>+
> 	__raw_uncore_write32(uncore, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>
> 	return true;
>-- 
>2.25.4
>


More information about the Intel-gfx mailing list