[Intel-gfx] i915 irq storm mitigation in 3.10
Jan Niggemann
jn at hz6.de
Mon Jul 22 21:28:33 CEST 2013
Egbert, Daniel, others,
Am 22.07.2013 10:04, schrieb Egbert Eich:
> Daniel Vetter writes:
> > On Sun, Jul 21, 2013 at 10:23 PM, Jan Niggemann <jn at hz6.de> wrote:
> > >> But every time this happens we only let through a few
> interrupts, so this
> > >> shouldn't affect you badly. Can you please check whether those
> slowdowns
> > >> line up with 2 minute intervalls?
> > >
> > > I observed these slowdowns for a couple of weeks now. On my
> machine, they
> > > only happen once, some minutes after a cold boot.
> > > They last for a minute or two, and then they are gone.
> > > I'd have guessed that the storm detection kicks in pretty
> quickly after a
> > > storm is detected and that it would go unnoticed.
> >
> > Hm, that sounds like something doesn't quite work as expected. We
> > should kill things once we get 5 interrupts or so in 1 second. So
> if
> > it's bad enough that it slows your machine down it really should
> only
> > be barely noticeable.
> >
>
> The logs show that the disable mechanism got triggered, so there was
> a storm that got detected.
> The respective message is generated by the worker, everything up to
> there (detection and marking disabled) seems to be fine.
> I bet we are still getting interrupts but the respective bit in
> hpd_event_bits doesn't get set any more. Since we unconditionally
> queue the worker on interrupt there is surprise it is so busy.
>
> Then this points to the call to hpd_irq_setup() in
> intel_hpd_irq_handler()
> not doing what is expected, ie masking out the stormy interrupt.
> Could it be that we can't mask/disable an interrupt before ACKing
> it?
>
> @Jan, could you also specify what hardware you are using (ie give us
> an output of lspci -n)?
It's a Lenovo ThinkPad T400, the model is 7434-AG2.
root at muretop:~# lspci -n
00:00.0 0600: 8086:2a40 (rev 07)
00:02.0 0300: 8086:2a42 (rev 07)
00:02.1 0380: 8086:2a43 (rev 07)
00:03.0 0780: 8086:2a44 (rev 07)
00:19.0 0200: 8086:10f5 (rev 03)
00:1a.0 0c03: 8086:2937 (rev 03)
00:1a.1 0c03: 8086:2938 (rev 03)
00:1a.2 0c03: 8086:2939 (rev 03)
00:1a.7 0c03: 8086:293c (rev 03)
00:1b.0 0403: 8086:293e (rev 03)
00:1c.0 0604: 8086:2940 (rev 03)
00:1c.1 0604: 8086:2942 (rev 03)
00:1c.3 0604: 8086:2946 (rev 03)
00:1c.4 0604: 8086:2948 (rev 03)
00:1d.0 0c03: 8086:2934 (rev 03)
00:1d.1 0c03: 8086:2935 (rev 03)
00:1d.2 0c03: 8086:2936 (rev 03)
00:1d.7 0c03: 8086:293a (rev 03)
00:1e.0 0604: 8086:2448 (rev 93)
00:1f.0 0601: 8086:2917 (rev 03)
00:1f.2 0106: 8086:2929 (rev 03)
00:1f.3 0c05: 8086:2930 (rev 03)
03:00.0 0280: 8086:4237
15:00.0 0607: 1180:0476 (rev ba)
As to the log: I messed up the kernel parameters this morning... was
out of coffee this morning and my 1,5y daughter played around me :-)
Here's my kernel log with drm.debug and printk.time enabled:
Uncompressed (22M): http://files.hz6.de/kern_20130722.log
bzip2'd (some 600 KB): http://files.hz6.de/kern_20130722.log.bz2
Regards
jan
More information about the Intel-gfx
mailing list