[Intel-gfx] [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent
Daniel Vetter
daniel at ffwll.ch
Fri Mar 20 07:32:52 PDT 2015
On Fri, Mar 20, 2015 at 01:39:51PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 01:02:10PM +0000, Chris Wilson wrote:
> > On Fri, Mar 20, 2015 at 11:06:57AM +0100, Daniel Vetter wrote:
> > > On Thu, Mar 19, 2015 at 10:17:42PM +0000, Chris Wilson wrote:
> > > > On Thu, Mar 19, 2015 at 06:37:28PM +0100, Daniel Vetter wrote:
> > > > > On Wed, Mar 18, 2015 at 06:19:22PM +0000, Chris Wilson wrote:
> > > > > > WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
> > > > > > WARN_ON(!list_empty(&vm->active_list))
> > > > >
> > > > > How does this come about - we call gpu_idle before this seems to blow up,
> > > > > so all requests should be completed?
> > > >
> > > > Honestly, I couldn't figure it out either. I had an epiphany when I saw
> > > > that we could now have an empty request list but non-empty active list
> > > > added a test to detect when that happens and shouted eureka when the
> > > > WARN fired. I could trigger the WARN in evict_vm pretty reliably, but
> > > > not since this patch. It could just be masking another bug.
> > >
> > > Can you perhaps double-check the theory by putting a
> > > WARN_ON(list_empty(active_list) != list_empyt(request_list)) into
> > > gpu_idle? Ofc with this patch reverted so that the bug surfaces again.
> >
> > [ 5215.567573] [drm:i915_verify_lists] *ERROR* render ring: active list not empty, but no requests
> > [ 5215.567586] ------------[ cut here ]------------
> > [ 5215.567598] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem.c:3166 i915_gpu_idle+0x88/0x90()
> > [ 5215.567602] WARN_ON(i915_verify_lists(dev))
> > [ 5215.567606] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.567708] CPU: 0 PID: 1304 Comm: Xorg Tainted: G W OE 4.0.0-rc4+ #108
> > [ 5215.567713] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.567718] 00000000 00000000 f46e1b98 c16b3e19 f46e1bd8 f46e1bc8 c1047f17 c1937e78
> > [ 5215.567733] f46e1bf4 00000518 c1937cec 00000c5e c14441e8 c14441e8 e733bdc8 00000000
> > [ 5215.567747] f6346c00 f46e1be0 c1047f83 00000009 f46e1bd8 c1937e78 f46e1bf4 f46e1c00
> > [ 5215.567762] Call Trace:
> > [ 5215.567776] [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.567788] [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.567797] [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567805] [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567815] [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.567823] [<c14441e8>] i915_gpu_idle+0x88/0x90
> > [ 5215.567833] [<c1439949>] i915_gem_evict_something+0x269/0x300
> > [ 5215.567843] [<c144754f>] i915_gem_object_do_pin+0x6ef/0xb20
> > [ 5215.567854] [<c14479c5>] i915_gem_object_pin+0x45/0x50
> > [ 5215.567864] [<c1439f08>] i915_gem_execbuffer_reserve_vma.isra.13+0x78/0x180
> > [ 5215.567874] [<c143a2e5>] i915_gem_execbuffer_reserve+0x2d5/0x320
> > [ 5215.567884] [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567894] [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.567906] [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.567915] [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567925] [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.567934] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567944] [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.567954] [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.567963] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567975] [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.567984] [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.567994] [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568005] [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568095] [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568136] [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568175] [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568198] ---[ end trace ab3f7e4953cb9eb6 ]---
> > [ 5215.568272] ------------[ cut here ]------------
> > [ 5215.568288] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem_evict.c:283 i915_gem_evict_vm+0x10c/0x140()
> > [ 5215.568292] WARN_ON(!list_empty(&vm->active_list))
> > [ 5215.568296] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.568383] CPU: 0 PID: 1304 Comm: Xorg Tainted: G W OE 4.0.0-rc4+ #108
> > [ 5215.568388] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.568393] 00000000 00000000 f46e1cc0 c16b3e19 f46e1d00 f46e1cf0 c1047f17 c193712c
> > [ 5215.568407] f46e1d1c 00000518 c19370d0 0000011b c1439c6c c1439c6c f3b225b0 e733c3ec
> > [ 5215.568421] 00000001 f46e1d08 c1047f83 00000009 f46e1d00 c193712c f46e1d1c f46e1d28
> > [ 5215.568435] Call Trace:
> > [ 5215.568445] [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.568455] [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.568465] [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568474] [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568483] [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.568492] [<c1439c6c>] i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568502] [<c143a236>] i915_gem_execbuffer_reserve+0x226/0x320
> > [ 5215.568511] [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568521] [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.568532] [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.568541] [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568550] [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.568560] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568568] [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.568577] [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.568587] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568599] [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.568607] [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.568616] [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568626] [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568635] [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568644] [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568653] [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568659] ---[ end trace ab3f7e4953cb9eb7 ]---
Ok, at least we have clear evidence now that the lists indeed seem to get
out of sync.
> Ah, so what it boils down to is that i915_gpu_idle() is a no-op here is
> list_empty(ring->request_list)) [intel_ring_idle:2176].
>
> Missing link discovered, I think the bug fixed by the patch is indeed
> the same one that triggered the first WARN.
But if we do that short-circuiting in ring_idle the all the requests
_should_ be completed. Which meanse retire_request_ring should move all
buffers to the inactive list, even when we do that before retiring
requests.
I'm still baffled and don't really understand what's going on here ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list