[Intel-xe] [PATCH] drm/xe/irq: Clear GFX_MSTR_IRQ as part of IRQ reset

Wed Sep 20 17:02:03 UTC 2023

On Wed, Sep 20, 2023 at 01:23:32PM -0300, Gustavo Sousa wrote:
> Quoting Ville Syrjälä (2023-09-20 07:56:04-03:00)
> >On Tue, Sep 19, 2023 at 11:08:04PM -0500, Lucas De Marchi wrote:
> >> On Tue, Sep 19, 2023 at 02:35:51PM -0300, Gustavo Sousa wrote:
> >> >Quoting Lucas De Marchi (2023-09-19 14:31:24-03:00)
> >> >>On Tue, Sep 19, 2023 at 11:41:10AM -0300, Gustavo Sousa wrote:
> >> >>>Starting with Xe_LP+, GFX_MSTR_IRQ contains status bits that have W1C
> >> >>>behavior. If we do not properly reset them, we would miss delivery of
> >> >>>interrupts if a pending bit is set when enabling IRQs.
> >> >>>
> >> >>>As an example, the display part of our probe routine contains paths
> >> >>>where we wait for vblank interrupts. If a display interrupt was already
> >> >>>pending when enabling IRQs, we would time out waiting for the vblank.
> >> >>>
> >> >>>That in fact happened recently when modprobing Xe on a Lunar Lake with a
> >> >>>specific configuration; and that's how we found out we were missing this
> >> >>>step in the IRQ enabling logic.
> >> >>>
> >> >>>Fix the issue by clearing GFX_MSTR_IRQ as part of the IRQ reset.
> >> >>>
> >> >>>BSpec: 50875, 54028, 62357
> >> >>>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
> >> >>>---
> >> >>> drivers/gpu/drm/xe/xe_irq.c | 4 ++++
> >> >>> 1 file changed, 4 insertions(+)
> >> >>>
> >> >>>diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
> >> >>>index ccb934f8fa34..3746e9204e48 100644
> >> >>>--- a/drivers/gpu/drm/xe/xe_irq.c
> >> >>>+++ b/drivers/gpu/drm/xe/xe_irq.c
> >> >>>@@ -456,6 +456,7 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
> >> >>>
> >> >>> static void gt_irq_reset(struct xe_tile *tile)
> >> >>> {
> >> >>>+        struct xe_device *xe = tile_to_xe(tile);
> >> >>>         struct xe_gt *mmio = tile->primary_gt;
> >> >>>
> >> >>>         u32 ccs_mask = xe_hw_engine_mask_per_class(tile->primary_gt,
> >> >>>@@ -463,6 +464,9 @@ static void gt_irq_reset(struct xe_tile *tile)
> >> >>>         u32 bcs_mask = xe_hw_engine_mask_per_class(tile->primary_gt,
> >> >>>                                                    XE_ENGINE_CLASS_COPY);
> >> >>>
> >> >>>+        if (GRAPHICS_VERx100(xe) >= 1210)
> >> >>>+                xe_mmio_write32(mmio, GFX_MSTR_IRQ, ~0);
> >> >>
> >> >>shouldn´t you exclude bit 31 (MSTR_INT) since it'ss not a status bit
> >> >>and would rather enable the interrupts... ?
> >> >
> >> >I thought about that, but looking closer at the BSpec for that register, that
> >> >bit seems not to exist for the targeted graphics IPs.
> >> 
> >> true, I missed that... but then this would be a weird place to reset
> >> it. We already forked the irq setup in xe_irq_reset(). Also, this
> >> is going to be called for each gt, which doesn't seem right.
> >> Better to move it to dg1_irq_reset().
> >
> >It should also be cleared after disabling/masking off all the lower
> >level interrupts, otherwise you could just get some bit relatching
> >immediately after clearing it.
> 
> Meaning to move this to be done after the call to
> xe_display_irq_reset()?

It should be more or less the last thing we clear.

> 
> The BSpec says:
> 
>   "For any new interrupts set after the 190010h read, new interrupts are
>   re-generated after setting 31 of 190008h."
> 
> Considering that we disable interrupts with dg1_intr_disable(), doesn't
> that mean that relatching won't happen until dg1_intr_enable() gets
> called?

Typically the latching happens whenever the lower level
interrupt register has an edge. The master interrupt
enable should only determine if those latched bits
propagate further in the chain.

So the master interrupt enable is basically same functions as IER,
whereas the latched status bits in there are now more or less
equivalent to IIR. There is no IMR equivalent to prevent the bits
from being latched in the first place.

> 
> Btw, from the sentence above, I believe the proper way of resetting this
> register would be to read its value and then write it back instead of
> simply writing 1's as I did in this version.

When handling interrupts you obviously want to write back the
value that you read out as those are the interrupts you are going
to handle. Clearing any other bit would risk losing said interrupts.
During irq_reset() we aren't going to handle anything so just
blindly clearing all bits is fine.

-- 
Ville Syrjälä
Intel