[Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy
Ben Widawsky
ben at bwidawsk.net
Thu Aug 22 18:19:43 CEST 2013
On Thu, Aug 22, 2013 at 11:55:21AM +0200, Daniel Vetter wrote:
> The important bugfix here is that we must not unlink the vma when
> we keep it around as a placeholder for the execbuf code. Since then we
> won't find it again when execbuf gets interrupt and restarted and
> create a 2nd vma. And since the code as-is isn't fit yet to deal with
> more than one vma, hilarity ensues.
This was the primary distinction between my vma->busy patch, and the
patch you actually merged from Chris. AFAICT, you've adopted my
vma->busy behavior which I remember being called out as a bug by Chris
(and being convinced he was right).
So can you explain for my sanity how this differs in behavior from
vma->busy?
>
> Specifically the dma map/unmap of the sg table isn't adjusted for
> multiple vmas yet and will blow up like this:
>
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.750976] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751021] IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751073] PGD 56bb5067 PUD ad3dd067 PMD 0
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751096] Oops: 0000 [#1] SMP
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751114] Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751291] CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751334] Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751365] task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751396] RIP: 0010:[<ffffffffa008fb37>] [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751446] RSP: 0018:ffff88004bdf5958 EFLAGS: 00010246
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751469] RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751499] RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751529] RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751559] R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751589] R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751619] FS: 00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751653] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751677] CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751707] Stack:
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751717] ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751752] ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751788] 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751823] Call Trace:
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751843] [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751875] [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751909] [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751947] [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.751983] [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752019] [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752054] [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752094] [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752131] [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752166] [<ffffffff810fc823>] ? might_fault+0x40/0x90
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752195] [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752231] [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752261] [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752292] [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752318] [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752341] [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752366] [<ffffffff817deda7>] ? sysret_check+0x1b/0x56
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752391] [<ffffffff8113073c>] SyS_ioctl+0x57/0x87
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752415] [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752443] [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752469] Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.752624] RIP [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.753750] RSP <ffff88004bdf5958>
> Aug 15 20:12:37 x-ivb9 kernel: [ 8081.754851] CR2: 0000000000000008
>
> The second thing to correct is the "only one vma for now" check in
> vma_unbind - since vma_destroy isn't always called the obj->vma_list
> might not be empty. Instead check that the vma list is singular at the
> beginning of vma_unbind. This is also more symmetric with bind_to_vm.
It's not a "correction" exactly. It's to address the fact that the patch
makes the destroy conditional. The old assertion was correct.
>
> Our first attempting at fixing this was
>
> commit 1be81a2f2cfd8789a627401d470423358fba2d76
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date: Tue Aug 20 12:56:40 2013 +0100
>
> drm/i915: Don't destroy the vma placeholder during execbuffer reservation
>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Ben Widawsky <ben at bwidawsk.net>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
It's not clear from the commit message if this actually fixes the bug
for you (which you seemed to be able to reproduce). Did it?
> ---
> drivers/gpu/drm/i915/i915_gem.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ef92c69..3e855ff 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
> drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
> int ret;
>
> + /* For now we only ever use 1 vma per object */
> + WARN_ON(!list_is_singular(&obj->vma_list));
> +
> if (list_empty(&vma->vma_link))
> return 0;
>
> @@ -2661,12 +2664,12 @@ int i915_vma_unbind(struct i915_vma *vma)
> drm_mm_remove_node(&vma->node);
>
> destroy:
> - i915_gem_vma_destroy(vma);
> + /* Keep the vma as a placeholder in the execbuffer reservation lists */
> + if (!list_empty(&vma->exec_list))
> + i915_gem_vma_destroy(vma);
Please see my vma->busy question at the top. To me this looks like the
behavior I had with vma->busy where you don't unlink from the vma_list
(which actually defies the original convention I had about the
vma_list, whereby a vma exists on a vma list IFF it has space allocated
in some address space). I think this is necessary though, as there are
really three states:
1. Bound
2. Temporarily unbound <-- this is the problematic one
3. Unbound
>
> /* Since the unbound list is global, only move to that list if
> - * no more VMAs exist.
> - * NB: Until we have real VMAs there will only ever be one */
> - WARN_ON(!list_empty(&obj->vma_list));
> + * no more VMAs exist. */
> if (list_empty(&obj->vma_list))
> list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
>
> @@ -4171,10 +4174,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
> WARN_ON(vma->node.allocated);
> list_del(&vma->vma_link);
>
> - /* Keep the vma as a placeholder in the execbuffer reservation lists */
> - if (!list_empty(&vma->exec_list))
> - return;
> -
> kfree(vma);
> }
If you really wan to remove this hunk, I think you should do it as a
revert (or remove it from the history completely if it isn't too late).
Instead though I think turning this into a WARN is still a valid thing
to do. Unless I've missed something. It shouldn't ever be safe to
destroy a VMA on an exec_list because we use the same list element in
eviction and if we accidentally destroy in eviction (called from
execbuf) we're in pretty big trouble.
Finally, I still think Chris' original suggestion of creating a new
member for eviction would make a lot of these bugs at least a little bit
easier to find. It's always hard to say though until you actually do it.
--
Ben Widawsky, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list