[Intel-gfx] a potential dead loop in intel_lrc_irq_handler

Dong, Chuanxiao chuanxiao.dong at intel.com
Mon Aug 7 10:31:57 UTC 2017


> -----Original Message-----
> From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> Sent: Monday, August 7, 2017 5:56 PM
> To: Dong, Chuanxiao <chuanxiao.dong at intel.com>; intel-
> gfx at lists.freedesktop.org; Joonas Lahtinen
> <joonas.lahtinen at linux.intel.com>
> Subject: Re: a potential dead loop in intel_lrc_irq_handler
> 
> Quoting Dong, Chuanxiao (2017-08-07 10:41:29)
> > Hello,
> >
> > Found there might be a corner case for intel_lrc_irq_handler() in a dead
> loop, want to understand if this can be real or not.
> >
> > The scenario is like:
> 
> > 1. Write wedged to trigger a GPU reset;
> 
> This is dangerous full stop, but even with a hangcheck the scenario is still
> plausible.
> 
> > 2. meanwhile, there is one ongoing request in port[0], and its context
> > switch interrupt is generated from HW; 3. as interrupt line is
> > disabled during GPU reset, it is possible that this interrupt is not
> > handled by intel_lrc_irq_handler(); 4. during GPU reset, the CSB tail
> > is reset to 0x7 which is a default value;
> 
> In theory, yes. This prevents the delayed context switch interrupt from
> having any meaning.
> 
> > 5. i915 try to replay this request during GPU reset;
> 
> If the context-switch occurred (but still pending in IIR), the request is
> complete, it will not be replayed.
> 
> > 6. GPU reset completed;
> > 7. handling the pending interrupt of the step#2.
> >
> > Normally as in step#5 driver wrote the ELSP and replayed a request so the
> CSB tail should be updated to 0 in step#7. But if the CSB tail updating is not
> that quick, in step#7 when handling the last pending interrupt the CSB tail is
> still 0x7, the intel_lrc_irq_handler() will be in a dead loop then.
> >
> > If the CSB tail updating is not synchronized with the ELSP writing then my
> understanding is that it is possible to encounter this corner case. If so, shall
> we clear the pending interrupts in IIR during i915_reset? Please correct me if
> anything wrong.
> 
> The CSB buf+tail is synchronized to the interrupt. Our goal is to make sure
> that the GPU is truly reset before we reset our state tracking so that we don't
> have pending events on replay.
> 
> However, the CSB itself is a little bit of a black box as it is squirreled away in a
> power context on reset, and it is only with a bit of handwaving that it is reset
> to a default empty value on reset.
> 
> CSB interrupt -> pending
> GPU reset -> clears CSB head/tail
But the GPU reset will make CSB_head = 0 and CSB_tail = 7.

> post-reset, re-enable interrupts, raise CSB interrupt
> -> intel_lrc_irq_handler()
> 	if (CSB_head == CSB_tail)
> 		break;

So here intel_lrc_irq_handler() cannot break out. Looks like we are still stuck in intel_lrc_irq_handler(), right?

Thanks
Chuanxiao
> 
> Should be no problem. Similarly for a delayed tasklet, we haven't posted the
> CSB interrupt and so we don't even read the CSB_head/tail as they as still
> undefined (prior to the first CSB interrupt).
> -Chris


More information about the Intel-gfx mailing list