[Intel-gfx] [PATCH 26/37] drm/i915/dg1: Handle GRF/IC ECC error irq

Chris Wilson chris at chris-wilson.co.uk
Thu May 21 08:19:56 UTC 2020


Quoting Lucas De Marchi (2020-05-21 01:37:52)
> From: Fernando Pacheco <fernando.pacheco at intel.com>
> 
> The error detection and correction capability
> for GRF and instruction cache (IC) will utilize
> the new interrupt and error handling infrastructure
> for dgfx products. The GFX device can generate
> a number of classes of error under the new
> infrastructure: correctable, non-fatal, and
> fatal errors.
> 
> The non-fatal and fatal error classes distinguish
> between levels of severity for uncorrectable errors.
> All ECC uncorrectable errors will be reported as
> fatal to produce the desired system response. Fatal
> errors are expected to route as PCIe error messages
> which should result in OS issuing a GFX device FLR.
> But the option exists to route fatal errors as
> interrupts.
> 
> Driver will only handle logging of errors. Anything
> more will be handled at system level.
> 
> For errors that will route as interrupts, three
> bits in the Master Interrupt Register will be used
> to convey the class of error.
> 
> For each class of error:
> 1. Determine source of error (IP block) by reading
>    the Device Error Source Register (RW1C) that
>    corresponds to the class of error being serviced.
> 2. If the generating IP block is GT, read and log the
>    GT Error Register (RW1C) that corresponds to the
>    class of error being serviced. Non-GT errors will
>    be logged in aggregate for now.
> 
> Bspec: 50875
> 
> Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Fernando Pacheco <fernando.pacheco at intel.com>
> Cc: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
> Signed-off-by: Fernando Pacheco <fernando.pacheco at intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 121 ++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_reg.h |  28 ++++++++
>  2 files changed, 149 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index ebc80e8b1599..17e679b910da 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2515,6 +2515,124 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>         return IRQ_HANDLED;
>  }
>  
> +static const char *
> +hardware_error_type_to_str(const enum hardware_error hw_err)
> +{
> +       switch (hw_err) {
> +       case HARDWARE_ERROR_CORRECTABLE:
> +               return "CORRECTABLE";
> +       case HARDWARE_ERROR_NONFATAL:
> +               return "NONFATAL";
> +       case HARDWARE_ERROR_FATAL:
> +               return "FATAL";
> +       default:
> +               return "UNKNOWN";
> +       }
> +}
> +
> +static void
> +gen12_gt_hw_error_handler(struct drm_i915_private * const i915,
> +                         const enum hardware_error hw_err)
> +{
> +       void __iomem * const regs = i915->uncore.regs;
> +       const char *hw_err_str = hardware_error_type_to_str(hw_err);
> +       u32 other_errors = ~(EU_GRF_ERROR | EU_IC_ERROR);
> +       u32 errstat;
> +
> +       lockdep_assert_held(&i915->irq_lock);

Wrong place and wrong locks.
-Chris


More information about the Intel-gfx mailing list