[Intel-gfx] [PATCH] drm/i915: we can't call move_to_active before intel_ring_begin

Wed Sep 25 19:11:35 CEST 2013

On Wed, Sep 25, 2013 at 05:05:15PM +0200, Daniel Vetter wrote:
> Otherwise we can die in a fire of not-yet-allocated lazy requests when
> we expect them to be there:

And Chris accuses me of violating Rusty's rules. This is an extremely
ugly caveat to put in an interface given that active tracking and ring
dispatch should have little connection.

Isn't it much simpler to just call intel_ring_alloc_seqno during
move_to_active?

> 
> [ 4405.463113] ------------[ cut here ]------------
> [ 4405.464769] kernel BUG at /home/user/src/linux/drivers/gpu/drm/i915/intel_ringbuffer.h:268!
> [ 4405.466392] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 4405.468007] Modules linked in: snd_hda_codec_hdmi hid_generic snd_hda_codec_realtek usbhid hid iTCO_wdt snd_hda_intel psmouse i2c_i801 snd_hda_codec pcspkr snd_hwdep i915 snd_pcm snd_page_alloc snd_timer cfbfillrect snd cfbimgblt lpc_ich cfbcopyarea soundcore drm_kms_helper mfd_core evdev wmi
> [ 4405.471387] CPU: 5 PID: 9300 Comm: gem_storedw_bat Not tainted 3.12.0-rc2-drm_vbl+ #1
> [ 4405.473047] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0100.2013.0517.0942 05/17/2013
> [ 4405.474712] task: ffff8800618d4b00 ti: ffff88010a806000 task.ti: ffff88010a806000
> [ 4405.476370] RIP: 0010:[<ffffffffa009ffa9>]  [<ffffffffa009ffa9>] i915_vma_move_to_active+0x1b9/0x1e0 [i915]
> [ 4405.478045] RSP: 0018:ffff88010a807be8  EFLAGS: 00010246
> [ 4405.479689] RAX: ffff88011a681000 RBX: ffff8800b364f9c0 RCX: ffff88011902b898
> [ 4405.481321] RDX: ffff8800d4a1b6e0 RSI: ffff8800d4a1a8b8 RDI: ffff88011902b840
> [ 4405.482932] RBP: ffff88010a807c08 R08: 0000000000000001 R09: 0000000000000000
> [ 4405.484526] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800d4a1a8b8
> [ 4405.486100] R13: 0000000000000000 R14: ffff8800d4a18000 R15: ffff8800b364f9c0
> [ 4405.487664] FS:  00007f36c1a738c0(0000) GS:ffff88011f340000(0000) knlGS:0000000000000000
> [ 4405.489216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4405.490747] CR2: 00007fff1b28ea30 CR3: 0000000119e0d000 CR4: 00000000001407e0
> [ 4405.492276] Stack:
> [ 4405.493774]  ffff88010a807dd8 ffff8800d4a1a8b8 ffff8800d3c1c400 ffffffffa00ac060
> [ 4405.495276]  ffff88010a807d28 ffffffffa00aa0db ffff88010a807cb8 ffffffff810aa4e4
> [ 4405.496776]  0000000000000003 ffff880000000001 ffff8800618d50e8 ffffffff81a9da00
> [ 4405.498265] Call Trace:
> [ 4405.499735]  [<ffffffffa00ac060>] ? gen6_ppgtt_insert_entries+0x1a0/0x1a0 [i915]
> [ 4405.501218]  [<ffffffffa00aa0db>] i915_gem_do_execbuffer.clone.18+0x65b/0x12e0 [i915]
> [ 4405.502685]  [<ffffffff810aa4e4>] ? lock_release_non_nested+0xa4/0x360
> [ 4405.504134]  [<ffffffffa00ab298>] i915_gem_execbuffer2+0xa8/0x290 [i915]
> [ 4405.505573]  [<ffffffff813552b9>] drm_ioctl+0x419/0x5c0
> [ 4405.506991]  [<ffffffff81128b12>] ? handle_mm_fault+0x352/0xa00
> [ 4405.508399]  [<ffffffffa00ab1f0>] ? i915_gem_execbuffer+0x490/0x490 [i915]
> [ 4405.509792]  [<ffffffff8103cd2c>] ? __do_page_fault+0x1fc/0x4b0
> [ 4405.511170]  [<ffffffff81165e66>] do_vfs_ioctl+0x96/0x560
> [ 4405.512533]  [<ffffffff81512163>] ? error_sti+0x5/0x6
> [ 4405.513878]  [<ffffffff81511d0d>] ? retint_swapgs+0xe/0x13
> [ 4405.515208]  [<ffffffff811663d1>] SyS_ioctl+0xa1/0xb0
> [ 4405.516522]  [<ffffffff8129ddee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 4405.517830]  [<ffffffff81512792>] system_call_fastpath+0x16/0x1b
> [ 4405.519119] Code: f6 83 40 01 00 00 01 0f 85 11 ff ff ff b8 01 00 00 00 f0 0f c1 03 ff c0 83 f8 01 7e 10 80 8b 40 01 00 00 01 e9 f5 fe ff ff 0f 0b <0f> 0b 80 3d 61 55 08 00 01 74 e7 be 2f 00 00 00 48 c7 c7 18 03
> [ 4405.520610] RIP  [<ffffffffa009ffa9>] i915_vma_move_to_active+0x1b9/0x1e0 [i915]
> [ 4405.522001]  RSP <ffff88010a807be8>
> [ 4405.523367] [drm:intel_crtc_set_config], [CRTC:3] [FB:50] #connectors=1 (x y) (0 0)
> [ 4405.524751] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0

> [ 4405.526152] [drm:intel_modeset_stage_output_state], [CONNECTOR:16:HDMI-A-2] to [CRTC:3]
> [ 4405.531096] [drm:intel_crtc_set_config], [CRTC:3] [FB:50] #connectors=1 (x y) (0 0)
> [ 4405.532491] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0
> [ 4405.533903] [drm:intel_modeset_stage_output_state], [CONNECTOR:16:HDMI-A-2] to [CRTC:3]
> [ 4405.538888] ---[ end trace 53d1b708421bb5b3 ]---
> 
> This regression has been introduced in from Ben Widawsky's ppgtt/vma
> enabling patch "drm/i915: Use the new vm [un]bind functions".
> 
> This should be exercised by
> igt/gem_storedw_batches_loop/secure-dispatch.
> 
> Note that this won't fix the full ordeal for real ppgtt since the
> potential allocation for the batch vma could recurse into the shrinker
> and wreak utter havoc with our carefully reserved buffers. But for now
> this is good enough.
> 
> The real fix involves some trickery to allocate the batch vma around
> the call to eb_lookup_vmas and do all the additional buffer space
> reserving for the batch in global gtt in the normal reservation step.
> 
> But that requires quite some frobbing of various pieces of code, so
> definitely a 2nd step.
> 
> v2: Clarify the copy&pasta comment and update it to suit the new
> location of the move_to_active call for the batch vma.
> 
> v3: Pimp the commit message a bit to explain that this isn't the full
> fix.
> 
> Reported-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Ben Widawsky <ben at bwidawsk.net>
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 34 ++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 88c924f..f540207 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -929,6 +929,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	drm_i915_private_t *dev_priv = dev->dev_private;
>  	struct eb_vmas *eb;
>  	struct drm_i915_gem_object *batch_obj;
> +	struct i915_vma *batch_vma;
>  	struct drm_clip_rect *cliprects = NULL;
>  	struct intel_ring_buffer *ring;
>  	struct i915_ctx_hang_stats *hs;
> @@ -1082,7 +1083,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		goto err;
>  
>  	/* take note of the batch buffer before we might reorder the lists */
> -	batch_obj = list_entry(eb->vmas.prev, struct i915_vma, exec_list)->obj;
> +	batch_vma = list_entry(eb->vmas.prev, struct i915_vma, exec_list);
> +	batch_obj = batch_vma->obj;
>  
>  	/* Move the objects en-masse into the GTT, evicting if necessary. */
>  	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
> @@ -1118,6 +1120,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	if (flags & I915_DISPATCH_SECURE) {
>  		struct i915_address_space *ggtt = obj_to_ggtt(batch_obj);
>  
> +		batch_vma = i915_gem_obj_lookup_or_create_vma(batch_obj,
> +							      ggtt);
> +		if (IS_ERR(batch_vma)) {
> +			DRM_DEBUG("Failed to lookup ggtt batch VMA\n");
> +			ret = PTR_ERR(batch_vma);
> +			goto err;
> +		}
> +
>  		/* Assuming all privileged batches are in the global GTT means
>  		 * we need to make sure we have a global gtt offset, as well as
>  		 * the PTEs mapped. As mentioned above, we can forego this on
> @@ -1128,21 +1138,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		if (ret)
>  			goto err;
>  
> -		ggtt->bind_vma(i915_gem_obj_to_ggtt(batch_obj),
> +		ggtt->bind_vma(batch_vma,
>  			       batch_obj->cache_level,
>  			       GLOBAL_BIND);
>  
> -		/* Since the active list is per VM, we need to make sure this
> -		 * VMA ends up on the GGTT's active list to avoid premature
> -		 * eviction.
> -		 */
> -		i915_vma_move_to_active(i915_gem_obj_to_ggtt(batch_obj), ring);
> -
>  		i915_gem_object_unpin(batch_obj);
> +	}
>  
> -		exec_start += i915_gem_obj_ggtt_offset(batch_obj);
> -	} else
> -		exec_start += i915_gem_obj_offset(batch_obj, vm);
> +	exec_start += batch_vma->node.start;
>  
>  	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
>  	if (ret)
> @@ -1209,6 +1212,15 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
>  
>  	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> +	/*
> +	 * Since the active list is per VM and the batch might be executed from
> +	 * the global GTT we need to make sure that the VMA ends up on the
> +	 * active list there, too, to avoid premature eviction. If the patch is
> +	 * in the same address space doing a 2nd move_to_active doesn't hurt
> +	 * since this is idempotent.
> +	 */
> +	i915_vma_move_to_active(batch_vma, ring);
> +
>  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
>  
>  err:
> -- 
> 1.8.4.rc3
> 

-- 
Ben Widawsky, Intel Open Source Technology Center