[Intel-gfx] [PATCH] drm/i915: Don't deref request->ctx inside unlocked print_request()

Wed Feb 28 12:49:00 UTC 2018

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Although we protect the request itself, we don't lock inside
> intel_engine_dump() and so the request maybe retired as we peek into it.
> One consequence is that the request->ctx may be freed before we
> dereference it, leading to a use-after-free. Replace the hw_id we are
> peeking from inside request->ctx with the request->fence.context, with
> which we can still track from which context the request originated
> (although to tie to HW reports requires a little more legwork, but is
> good enough to follow the GEM traces).
>
> [52640.729670] general protection fault: 0000 [#2] SMP
> [52640.729694] Dumping ftrace buffer:
> [52640.729701]    (ftrace buffer empty)
> [52640.729705] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_\
> temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hwdep gha\
> sh_clmulni_intel snd_hda_core snd_pcm mei_me mei i915 r8169 mii prime_numbers i2c_hid
> [52640.729748] CPU: 2 PID: 4335 Comm: gem_exec_schedu Tainted: G     UD W        4.16.0-rc3+ #7
> [52640.729759] Hardware name: Acer Aspire E5-575G/Ironman_SK  , BIOS V1.12 08/02/2016
> [52640.729803] RIP: 0010:print_request+0x2b/0xb0 [i915]
> [52640.729811] RSP: 0018:ffffc90001453c18 EFLAGS: 00010206
> [52640.729820] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801e0292d40 RCX: 0000000000000006
> [52640.729829] RDX: ffffc90001453c60 RSI: ffff8801e0292d40 RDI: 0000000000000003
> [52640.729838] RBP: ffffc90001453d80 R08: 0000000000000000 R09: 0000000000000001
> [52640.729847] R10: ffffc90001453bd0 R11: ffffc90001453c73 R12: ffffc90001453c60
> [52640.729856] R13: ffffc90001453d80 R14: ffff8801d5a683c8 R15: ffff8801e0292d40
> [52640.729866] FS:  00007f1ee50548c0(0000) GS:ffff8801e8200000(0000) knlGS:0000000000000000
> [52640.729876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52640.729884] CR2: 00007f1ee5077000 CR3: 00000001d9411004 CR4: 00000000003606e0
> [52640.729893] Call Trace:
> [52640.729922]  intel_engine_print_registers+0x623/0x890 [i915]
> [52640.729948]  intel_engine_dump+0x4a3/0x590 [i915]
> [52640.729957]  ? seq_printf+0x3a/0x50
> [52640.729977]  i915_engine_info+0xb8/0xe0 [i915]
> [52640.729984]  ? drm_mode_gamma_get_ioctl+0xf0/0xf0
> [52640.729990]  seq_read+0xd5/0x410
> [52640.729997]  full_proxy_read+0x4b/0x70
> [52640.730004]  __vfs_read+0x1e/0x120
> [52640.730009]  ? do_sys_open+0x134/0x220
> [52640.730015]  ? kmem_cache_free+0x174/0x2b0
> [52640.730021]  vfs_read+0xa1/0x150
> [52640.730026]  SyS_read+0x40/0xa0
> [52640.730032]  do_syscall_64+0x65/0x1a0
> [52640.730038]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Reported-by: Mika Kuoppala <mika.kuoppala at intel.com>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>

I found how to associate hw_id with fence.context on trace,
so lets fix this race.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>