[Intel-gfx] [PATCH 26/37] drm/i915/dg1: Handle GRF/IC ECC error irq
Chris Wilson
chris at chris-wilson.co.uk
Thu May 21 08:19:56 UTC 2020
Quoting Lucas De Marchi (2020-05-21 01:37:52)
> From: Fernando Pacheco <fernando.pacheco at intel.com>
>
> The error detection and correction capability
> for GRF and instruction cache (IC) will utilize
> the new interrupt and error handling infrastructure
> for dgfx products. The GFX device can generate
> a number of classes of error under the new
> infrastructure: correctable, non-fatal, and
> fatal errors.
>
> The non-fatal and fatal error classes distinguish
> between levels of severity for uncorrectable errors.
> All ECC uncorrectable errors will be reported as
> fatal to produce the desired system response. Fatal
> errors are expected to route as PCIe error messages
> which should result in OS issuing a GFX device FLR.
> But the option exists to route fatal errors as
> interrupts.
>
> Driver will only handle logging of errors. Anything
> more will be handled at system level.
>
> For errors that will route as interrupts, three
> bits in the Master Interrupt Register will be used
> to convey the class of error.
>
> For each class of error:
> 1. Determine source of error (IP block) by reading
> the Device Error Source Register (RW1C) that
> corresponds to the class of error being serviced.
> 2. If the generating IP block is GT, read and log the
> GT Error Register (RW1C) that corresponds to the
> class of error being serviced. Non-GT errors will
> be logged in aggregate for now.
>
> Bspec: 50875
>
> Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Fernando Pacheco <fernando.pacheco at intel.com>
> Cc: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
> Signed-off-by: Fernando Pacheco <fernando.pacheco at intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
> drivers/gpu/drm/i915/i915_irq.c | 121 ++++++++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_reg.h | 28 ++++++++
> 2 files changed, 149 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index ebc80e8b1599..17e679b910da 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2515,6 +2515,124 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
> return IRQ_HANDLED;
> }
>
> +static const char *
> +hardware_error_type_to_str(const enum hardware_error hw_err)
> +{
> + switch (hw_err) {
> + case HARDWARE_ERROR_CORRECTABLE:
> + return "CORRECTABLE";
> + case HARDWARE_ERROR_NONFATAL:
> + return "NONFATAL";
> + case HARDWARE_ERROR_FATAL:
> + return "FATAL";
> + default:
> + return "UNKNOWN";
> + }
> +}
> +
> +static void
> +gen12_gt_hw_error_handler(struct drm_i915_private * const i915,
> + const enum hardware_error hw_err)
> +{
> + void __iomem * const regs = i915->uncore.regs;
> + const char *hw_err_str = hardware_error_type_to_str(hw_err);
> + u32 other_errors = ~(EU_GRF_ERROR | EU_IC_ERROR);
> + u32 errstat;
> +
> + lockdep_assert_held(&i915->irq_lock);
Wrong place and wrong locks.
-Chris
More information about the Intel-gfx
mailing list