[Intel-gfx] [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent
Chris Wilson
chris at chris-wilson.co.uk
Fri Mar 20 06:39:51 PDT 2015
On Fri, Mar 20, 2015 at 01:02:10PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 11:06:57AM +0100, Daniel Vetter wrote:
> > On Thu, Mar 19, 2015 at 10:17:42PM +0000, Chris Wilson wrote:
> > > On Thu, Mar 19, 2015 at 06:37:28PM +0100, Daniel Vetter wrote:
> > > > On Wed, Mar 18, 2015 at 06:19:22PM +0000, Chris Wilson wrote:
> > > > > WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
> > > > > WARN_ON(!list_empty(&vm->active_list))
> > > >
> > > > How does this come about - we call gpu_idle before this seems to blow up,
> > > > so all requests should be completed?
> > >
> > > Honestly, I couldn't figure it out either. I had an epiphany when I saw
> > > that we could now have an empty request list but non-empty active list
> > > added a test to detect when that happens and shouted eureka when the
> > > WARN fired. I could trigger the WARN in evict_vm pretty reliably, but
> > > not since this patch. It could just be masking another bug.
> >
> > Can you perhaps double-check the theory by putting a
> > WARN_ON(list_empty(active_list) != list_empyt(request_list)) into
> > gpu_idle? Ofc with this patch reverted so that the bug surfaces again.
>
> [ 5215.567573] [drm:i915_verify_lists] *ERROR* render ring: active list not empty, but no requests
> [ 5215.567586] ------------[ cut here ]------------
> [ 5215.567598] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem.c:3166 i915_gpu_idle+0x88/0x90()
> [ 5215.567602] WARN_ON(i915_verify_lists(dev))
> [ 5215.567606] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> [ 5215.567708] CPU: 0 PID: 1304 Comm: Xorg Tainted: G W OE 4.0.0-rc4+ #108
> [ 5215.567713] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> [ 5215.567718] 00000000 00000000 f46e1b98 c16b3e19 f46e1bd8 f46e1bc8 c1047f17 c1937e78
> [ 5215.567733] f46e1bf4 00000518 c1937cec 00000c5e c14441e8 c14441e8 e733bdc8 00000000
> [ 5215.567747] f6346c00 f46e1be0 c1047f83 00000009 f46e1bd8 c1937e78 f46e1bf4 f46e1c00
> [ 5215.567762] Call Trace:
> [ 5215.567776] [<c16b3e19>] dump_stack+0x41/0x52
> [ 5215.567788] [<c1047f17>] warn_slowpath_common+0x87/0xc0
> [ 5215.567797] [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> [ 5215.567805] [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> [ 5215.567815] [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> [ 5215.567823] [<c14441e8>] i915_gpu_idle+0x88/0x90
> [ 5215.567833] [<c1439949>] i915_gem_evict_something+0x269/0x300
> [ 5215.567843] [<c144754f>] i915_gem_object_do_pin+0x6ef/0xb20
> [ 5215.567854] [<c14479c5>] i915_gem_object_pin+0x45/0x50
> [ 5215.567864] [<c1439f08>] i915_gem_execbuffer_reserve_vma.isra.13+0x78/0x180
> [ 5215.567874] [<c143a2e5>] i915_gem_execbuffer_reserve+0x2d5/0x320
> [ 5215.567884] [<c11594cd>] ? __kmalloc+0x14d/0x190
> [ 5215.567894] [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> [ 5215.567906] [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> [ 5215.567915] [<c11594cd>] ? __kmalloc+0x14d/0x190
> [ 5215.567925] [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> [ 5215.567934] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> [ 5215.567944] [<c1401d67>] drm_ioctl+0x1b7/0x510
> [ 5215.567954] [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> [ 5215.567963] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> [ 5215.567975] [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> [ 5215.567984] [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> [ 5215.567994] [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> [ 5215.568005] [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> [ 5215.568095] [<c1185f62>] ? __fget_light+0x22/0x60
> [ 5215.568136] [<c117dc50>] SyS_ioctl+0x60/0x90
> [ 5215.568175] [<c16b9bc8>] sysenter_do_call+0x12/0x12
> [ 5215.568198] ---[ end trace ab3f7e4953cb9eb6 ]---
> [ 5215.568272] ------------[ cut here ]------------
> [ 5215.568288] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem_evict.c:283 i915_gem_evict_vm+0x10c/0x140()
> [ 5215.568292] WARN_ON(!list_empty(&vm->active_list))
> [ 5215.568296] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> [ 5215.568383] CPU: 0 PID: 1304 Comm: Xorg Tainted: G W OE 4.0.0-rc4+ #108
> [ 5215.568388] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> [ 5215.568393] 00000000 00000000 f46e1cc0 c16b3e19 f46e1d00 f46e1cf0 c1047f17 c193712c
> [ 5215.568407] f46e1d1c 00000518 c19370d0 0000011b c1439c6c c1439c6c f3b225b0 e733c3ec
> [ 5215.568421] 00000001 f46e1d08 c1047f83 00000009 f46e1d00 c193712c f46e1d1c f46e1d28
> [ 5215.568435] Call Trace:
> [ 5215.568445] [<c16b3e19>] dump_stack+0x41/0x52
> [ 5215.568455] [<c1047f17>] warn_slowpath_common+0x87/0xc0
> [ 5215.568465] [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> [ 5215.568474] [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> [ 5215.568483] [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> [ 5215.568492] [<c1439c6c>] i915_gem_evict_vm+0x10c/0x140
> [ 5215.568502] [<c143a236>] i915_gem_execbuffer_reserve+0x226/0x320
> [ 5215.568511] [<c11594cd>] ? __kmalloc+0x14d/0x190
> [ 5215.568521] [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> [ 5215.568532] [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> [ 5215.568541] [<c11594cd>] ? __kmalloc+0x14d/0x190
> [ 5215.568550] [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> [ 5215.568560] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> [ 5215.568568] [<c1401d67>] drm_ioctl+0x1b7/0x510
> [ 5215.568577] [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> [ 5215.568587] [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> [ 5215.568599] [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> [ 5215.568607] [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> [ 5215.568616] [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> [ 5215.568626] [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> [ 5215.568635] [<c1185f62>] ? __fget_light+0x22/0x60
> [ 5215.568644] [<c117dc50>] SyS_ioctl+0x60/0x90
> [ 5215.568653] [<c16b9bc8>] sysenter_do_call+0x12/0x12
> [ 5215.568659] ---[ end trace ab3f7e4953cb9eb7 ]---
Ah, so what it boils down to is that i915_gpu_idle() is a no-op here is
list_empty(ring->request_list)) [intel_ring_idle:2176].
Missing link discovered, I think the bug fixed by the patch is indeed
the same one that triggered the first WARN.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list