[Intel-gfx] [PATCH 2/4] drm/i915: Synchronize irq before parking each engine

Chris Wilson chris at chris-wilson.co.uk
Tue Oct 24 08:01:22 UTC 2017


Quoting Chris Wilson (2017-10-23 22:32:35)
> When we park the engine (upon idling), we kill the irq tasklet. However,
> to be sure that it is not restarted by a final interrupt after doing so,
> flush the interrupt handler before parking. As we only park the engines
> when we believe the system is idle, there should not be any spurious
> interrupts later to distrub us; so flushing the final in-flight
> interrupt should be sufficient.

And even this is not enough for some mischievous hw:

<4>[  329.637536] WARN_ON(!engine->i915->gt.awake)
<4>[  329.637551] ------------[ cut here ]------------
<4>[  329.637569] WARNING: CPU: 3 PID: 74 at drivers/gpu/drm/i915/i915_irq.c:1394 gen8_cs_irq_handler+0x7c/0xe0 [i915]
<4>[  329.637571] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hwdep x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e snd_pcm ptp pps_core mei_me prime_numbers mei pinctrl_sunrisepoint pinctrl_intel i2c_hid
<4>[  329.637605] CPU: 3 PID: 74 Comm: kworker/u8:1 Tainted: G     U          4.14.0-rc6-CI-Patchwork_6149+ #1
<4>[  329.637606] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
<4>[  329.637624] Workqueue: i915 i915_gem_idle_work_handler [i915]
<4>[  329.637627] task: ffff880272ff2880 task.stack: ffffc9000063c000
<4>[  329.637640] RIP: 0010:gen8_cs_irq_handler+0x7c/0xe0 [i915]
<4>[  329.637642] RSP: 0018:ffff88027ed83e30 EFLAGS: 00010086
<4>[  329.637644] RAX: 0000000000000020 RBX: ffff8802698a8008 RCX: 0000000000010002
<4>[  329.637645] RDX: 0000000000000000 RSI: ffffffff81d0dbdc RDI: ffffffff81cc1b56
<4>[  329.637647] RBP: ffff88027ed83e60 R08: 0000000000000000 R09: 0000000000000001
<4>[  329.637648] R10: 00000000d8873c29 R11: 00000000148e3c06 R12: 0000000000000100
<4>[  329.637650] R13: ffff880269240000 R14: 0000000000000092 R15: ffff88026a6a6918
<4>[  329.637651] FS:  0000000000000000(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
<4>[  329.637653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  329.637654] CR2: 00007f099468d198 CR3: 0000000003e0f002 CR4: 00000000003606e0
<4>[  329.637655] Call Trace:
<4>[  329.637657]  <IRQ>
<4>[  329.637672]  gen8_gt_irq_handler+0x102/0x120 [i915]
<4>[  329.637685]  gen8_irq_handler+0x90/0x670 [i915]
<4>[  329.637690]  __handle_irq_event_percpu+0x49/0x350
<4>[  329.637694]  handle_irq_event_percpu+0x23/0x60
<4>[  329.637696]  handle_irq_event+0x39/0x60
<4>[  329.637699]  handle_edge_irq+0xf4/0x1c0
<4>[  329.637702]  handle_irq+0x1a/0x30
<4>[  329.637704]  do_IRQ+0x68/0x130
<4>[  329.637707]  common_interrupt+0x9a/0x9a
<4>[  329.637709]  </IRQ>
<4>[  329.637711] RIP: 0010:_raw_spin_unlock_irq+0x32/0x50
<4>[  329.637712] RSP: 0018:ffffc9000063fda8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff6d
<4>[  329.637715] RAX: ffff880272ff2880 RBX: ffff8802692443e0 RCX: 0000000000000006
<4>[  329.637716] RDX: 0000000000001379 RSI: ffffffff81d0dbdc RDI: 0000000000000001
<4>[  329.637718] RBP: ffffc9000063fdb0 R08: ffff880272ff3190 R09: 0000000000000000
<4>[  329.637719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802692443e0
<4>[  329.637720] R13: ffff880269240070 R14: ffff880269247358 R15: 0000000081e40e00
<4>[  329.637726]  ? _raw_spin_unlock_irq+0x2c/0x50
<4>[  329.637739]  gen6_disable_rps_interrupts+0x62/0x90 [i915]
<4>[  329.637753]  gen6_rps_idle+0x1d/0xe0 [i915]
<4>[  329.637770]  i915_gem_idle_work_handler+0x188/0x190 [i915]
<4>[  329.637773]  process_one_work+0x221/0x650
<4>[  329.637777]  worker_thread+0x1db/0x3b0
<4>[  329.637781]  kthread+0x114/0x150
<4>[  329.637783]  ? process_one_work+0x650/0x650
<4>[  329.637785]  ? kthread_create_on_node+0x40/0x40
<4>[  329.637788]  ret_from_fork+0x27/0x40
<4>[  329.637792] Code: 48 83 c4 20 5b 41 5c 5d c3 48 8b 07 80 b8 cc 81 00 00 00 75 42 48 c7 c6 f0 27 29 a0 48 c7 c7 3e 48 28 a0 89 55 d4 e8 b5 ca f9 e0 <0f> ff 48 8b 03 48 8d 75 d8 48 89 df 48 c7 45 d8 e0 78 60 81 48 
<4>[  329.637867] ---[ end trace e6209c9962196e53 ]---
<6>[  329.637870] i915 0000:00:02.0: [drm] rcs0
<6>[  329.637872] i915 0000:00:02.0: [drm] 	current seqno 88f5, last 88f5, hangcheck 0 [29638 ms], inflight 0
<6>[  329.637874] i915 0000:00:02.0: [drm] 	Reset count: 2
<6>[  329.637876] i915 0000:00:02.0: [drm] 	Requests:
<6>[  329.637887] i915 0000:00:02.0: [drm] 	RING_START: 0x0000f000 [0x00000000]
<6>[  329.637889] i915 0000:00:02.0: [drm] 	RING_HEAD:  0x00000c10 [0x00000000]
<6>[  329.637891] i915 0000:00:02.0: [drm] 	RING_TAIL:  0x00000c10 [0x00000000]
<6>[  329.637894] i915 0000:00:02.0: [drm] 	RING_CTL:   0x00003000 []
<6>[  329.637897] i915 0000:00:02.0: [drm] 	ACTHD:  0x00000000_00000c10
<6>[  329.637900] i915 0000:00:02.0: [drm] 	BBADDR: 0x00000000_00000004
<6>[  329.637902] i915 0000:00:02.0: [drm] 	Execlist status: 0x00000301 00000000
<6>[  329.637904] i915 0000:00:02.0: [drm] 	Execlist CSB read 4 [4 cached], write 4 [4 from hws], interrupt posted? no
<6>[  329.637906] i915 0000:00:02.0: [drm] 		ELSP[0] idle
<6>[  329.637908] i915 0000:00:02.0: [drm] 		ELSP[1] idle
<6>[  329.637909] i915 0000:00:02.0: [drm] 		HW active? 0x0

So if we want to completely eradicate that last unwanted interrupt, it
looks like IER/IMR time. However, that can wait until we have 
intel_engine_park/unpark (the spurious interrupt filter is good enough
for now).
-Chris


More information about the Intel-gfx mailing list