[Intel-gfx] [PATCH] drm/i915: Exit cherryview_irq_handler() after one pass

Thu Mar 10 12:10:46 UTC 2016

On Thu, Mar 10, 2016 at 02:01:27PM +0200, Ville Syrjälä wrote:
> On Thu, Mar 10, 2016 at 11:44:28AM +0000, Chris Wilson wrote:
> > This effectively reverts
> > 
> > commit 8e5fd599eb219f1054e39b40d18b217af669eea9
> > Author: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Date:   Wed Apr 9 13:28:50 2014 +0300
> > 
> >     drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
> > 
> > as under continuous execlists load we can saturate the IRQ handler,
> > destablising the tsc clock and triggering the NMI watchdog to declare a hung
> > CPU.
> > 
> > [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
> > [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
> > [  552.756210] clocksource: Switched to clocksource refined-jiffies
> > [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
> > [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
> > [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> > [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
> > [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
> > [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
> > [  575.217967] Call Trace:
> > [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
> > [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
> > [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
> > [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
> > [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
> > [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
> > [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
> > [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
> > [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
> > [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
> > [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
> > [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
> > [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
> > [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
> > [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
> > 
> > However, not servicing all available IIR within the handler does hurt the
> > throughput of pathological nop execbuf by about 20%, with a similar effect
> > upon the dispatch latency of a series of execbuf.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
> > Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Cc: Antti Koskipää <antti.koskipaa at linux.intel.com
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: stable at vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 40 +++++++++++++++++++---------------------
> >  1 file changed, 19 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 53e5104964b3..8a3230427884 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1829,35 +1829,33 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
> >  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
> >  	disable_rpm_wakeref_asserts(dev_priv);
> >  
> > -	for (;;) {
> > -		master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > -		iir = I915_READ(VLV_IIR);
> > +	master_ctl = I915_READ(GEN8_MASTER_IRQ) & ~GEN8_MASTER_IRQ_CONTROL;
> > +	iir = I915_READ(VLV_IIR);
> >  
> > -		if (master_ctl == 0 && iir == 0)
> > -			break;
> > +	if (master_ctl == 0 && iir == 0)
> > +		break;
> 
> goto something?

Sigh. The problem of rewriting the "obvious" patch against -nightly. I
just changed the for(;;) into do {} while(0) for testing. Perhaps I
should stick with that in case we need to flip flop agin.

> Apart from that I have no objections if it doesn't cause problems
> with interrupts getting lost and whatnot. That was the original reason
> for it I think, but at least I myself never really looked into it. IIRC
> Rafael just told me they needed to do it to get the thing working, so
> I just put the patch in. And that was before I had even seen any silicon.

My testing only looks at the GT side, and we do stress that pretty hard
because of execlists and have reasonable methods of detection if we stop
processing execbuf. I'm more worried about the display and pipe interrupts.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre