[Intel-gfx] [PATCH] drm/i915: Check that each request phase is completed before retiring

Matthew Auld matthew.william.auld at gmail.com
Fri Nov 18 17:27:00 UTC 2016


On 18 November 2016 at 14:34, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Trying to chase an impossible bug (ivb):
>
> [  207.765411] [drm:i915_reset_and_wakeup [i915]] resetting chip
> [  207.765734] [drm:i915_gem_reset [i915]] resetting render ring to restart from tail of request 0x4ee834
> [  207.765791] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on RC6p on RC6pp off
> [  207.767213] [drm:intel_guc_setup [i915]] GuC fw status: path (null), fetch NONE, load NONE
> [  207.767515] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:203!
> [  207.767551] invalid opcode: 0000 [#1] PREEMPT SMP
> [  207.767576] Modules linked in: snd_hda_intel i915 cdc_ncm usbnet mii x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me mei snd_pcm sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915]
> [  207.767808] CPU: 3 PID: 8855 Comm: gem_ringfill Tainted: G     U          4.9.0-rc5-CI-Patchwork_3052+ #1
> [  207.767854] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
> [  207.767894] task: ffff88012c82a740 task.stack: ffffc9000383c000
> [  207.767927] RIP: 0010:[<ffffffffa00a0a3a>]  [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915]
> [  207.767999] RSP: 0018:ffffc9000383fb20  EFLAGS: 00010293
> [  207.768027] RAX: 00000000004ee83c RBX: ffff880135dcb480 RCX: 00000000004ee83a
> [  207.768062] RDX: ffff88012fea42a8 RSI: 0000000000000001 RDI: ffff88012c82af68
> [  207.768095] RBP: ffffc9000383fb48 R08: 0000000000000000 R09: 0000000000000000
> [  207.768129] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880135dcb480
> [  207.768163] R13: ffff88012fea42a8 R14: 0000000000000000 R15: 00000000000001d8
> [  207.768200] FS:  00007f955f658740(0000) GS:ffff88013e2c0000(0000) knlGS:0000000000000000
> [  207.768239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  207.768258] CR2: 0000555899725930 CR3: 00000001316f6000 CR4: 00000000001406e0
> [  207.768286] Stack:
> [  207.768299]  ffff880135dcb480 ffff880135dcbe00 ffff88012fea42a8 0000000000000000
> [  207.768350]  00000000000001d8 ffffc9000383fb70 ffffffffa00a1339 0000000000000000
> [  207.768402]  ffff88012f296c88 00000000000003f0 ffffc9000383fbb0 ffffffffa00b582d
> [  207.768453] Call Trace:
> [  207.768493]  [<ffffffffa00a1339>] i915_gem_request_retire_upto+0x49/0x90 [i915]
> [  207.768553]  [<ffffffffa00b582d>] intel_ring_begin+0x15d/0x2d0 [i915]
> [  207.768608]  [<ffffffffa00b59cb>] intel_ring_alloc_request_extras+0x2b/0x40 [i915]
> [  207.768667]  [<ffffffffa00a2fd9>] i915_gem_request_alloc+0x359/0x440 [i915]
> [  207.768723]  [<ffffffffa008bd03>] i915_gem_do_execbuffer.isra.15+0x783/0x1a10 [i915]
> [  207.768766]  [<ffffffff811a6a2e>] ? __might_fault+0x3e/0x90
> [  207.768816]  [<ffffffffa008d380>] i915_gem_execbuffer2+0xc0/0x250 [i915]
> [  207.768854]  [<ffffffff815532a6>] drm_ioctl+0x1f6/0x480
> [  207.768900]  [<ffffffffa008d2c0>] ? i915_gem_execbuffer+0x330/0x330 [i915]
> [  207.768939]  [<ffffffff81202f6e>] do_vfs_ioctl+0x8e/0x690
> [  207.768972]  [<ffffffff818193ac>] ? retint_kernel+0x2d/0x2d
> [  207.769004]  [<ffffffff810d6ef2>] ? trace_hardirqs_on_caller+0x122/0x1b0
> [  207.769039]  [<ffffffff812035ac>] SyS_ioctl+0x3c/0x70
> [  207.769068]  [<ffffffff818189ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  207.769103] Code: 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 8b 35 fa 7b e1 e1 85 f6 0f 85 55 03 00 00 41 8b 84 24 80 02 00 00 85 c0 75 02 <0f> 0b 49 8b 94 24 a8 00 00 00 48 8b 8a e0 01 00 00 8b 89 c0 00
> [  207.769400] RIP  [<ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915]
> [  207.769463]  RSP <ffffc9000383fb20>
>
> Let's add a couple more BUG_ONs before this to ascertain that the request
> did make it to hardware. The impossible part of this stacktrace is that
> request must have been considered completed by the i915_request_wait()
> before we tried to retire it.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Reviewed-by: Matthew Auld <matthew.auld at intel.com>


More information about the Intel-gfx mailing list