[Intel-gfx] [PATCH 2/2] drm/i915: Try to detect sudden loss of MMIO access
Lucas De Marchi
lucas.demarchi at intel.com
Fri Feb 12 21:59:42 UTC 2021
On Fri, Feb 12, 2021 at 01:19:25PM -0800, Matt Roper wrote:
>In rare circumstances bugs in PCI programming, broken BIOS, or failing
>hardware can cause the CPU to lose access to the MMIO BAR on dgfx
>platforms. This is a pretty catastrophic failure since all register
>reads come back with values of 0xFFFFFFFF. Let's check for this special
>case while doing our usual checks for unclaimed registers; the FPGA_DBG
>register we use for those checks on modern platforms has some unused
>bits that will always read back as 0 when things are behaving properly;
>we can use them as canaries to detect when MMIO itself has suddenly
>broken and try to print a more informative error message in the logs.
>
>v2: Let the detection function still return 'true' if we've lost our
> MMIO access. We'll still get an extra false positive message about
> an unclaimed register access, but we'll still honor the 'mmio_debug'
> limit and not spam the log. (Lucas)
>
>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
Lucas De Marchi
>---
> drivers/gpu/drm/i915/intel_uncore.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>index 5098f95d71b0..661b50191f2b 100644
>--- a/drivers/gpu/drm/i915/intel_uncore.c
>+++ b/drivers/gpu/drm/i915/intel_uncore.c
>@@ -465,6 +465,22 @@ fpga_check_for_unclaimed_mmio(struct intel_uncore *uncore)
> if (likely(!(dbg & FPGA_DBG_RM_NOCLAIM)))
> return false;
>
>+ /*
>+ * Bugs in PCI programming (or failing hardware) can occasionally cause
>+ * us to lose access to the MMIO BAR. When this happens, register
>+ * reads will come back with 0xFFFFFFFF for every register and things
>+ * go bad very quickly. Let's try to detect that special case and at
>+ * least try to print a more informative message about what has
>+ * happened.
>+ *
>+ * During normal operation the FPGA_DBG register has several unused
>+ * bits that will always read back as 0's so we can use them as canaries
>+ * to recognize when MMIO accesses are just busted.
>+ */
>+ if (unlikely(dbg == ~0))
>+ drm_err(&uncore->i915->drm,
>+ "Lost access to MMIO BAR; all registers now read back as 0xFFFFFFFF!\n");
>+
> __raw_uncore_write32(uncore, FPGA_DBG, FPGA_DBG_RM_NOCLAIM);
>
> return true;
>--
>2.25.4
>
More information about the Intel-gfx
mailing list