[Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch

Lister, Ian ian.lister at intel.com
Fri Dec 6 11:05:51 CET 2013



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch]
> Sent: Thursday, December 05, 2013 2:43 PM
> To: Intel Graphics Development
> Cc: Daniel Vetter; Lister, Ian; stable at vger.kernel.org; Widawsky, Benjamin;
> Stéphane Marchesin; Bloomfield, Jon
> Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
> 
> So apparently under ridiculous amounts of memory pressure we can get into
> trouble in do_switch when we try to move the old hw context backing
> storage object onto the active lists.
> 
> With list debugging enabled that usually results in us chasing a poisoned
> pointer - which means we've hit upon a vma that has been removed from all
> lrus with list_del (and then deallocated, so it's a real use-after free).
> 
> Ian Lister has done some great callchain chasing and noticed that we can
> reenter do_switch:
> 
> i915_gem_do_execbuffer()
> 
> i915_switch_context()
> 
> do_switch()
>    from = ring->last_context;
>    i915_gem_object_pin()
> 
>       i915_gem_object_bind_to_gtt()
>          ret = drm_mm_insert_node_in_range_generic();
>          // If the above call fails then it will try i915_gem_evict_something()
>          // If that fails it will call i915_gem_evict_everything() ...
> 	 i915_gem_evict_everything()
> 	    i915_gpu_idle()
> 	       i915_switch_context(DEFAULT_CONTEXT)
> 
> Like with everything else where the shrinker or eviction code can invalidate
> pointers we need to reload relevant state.
> 
> Note that there's no need to recheck whether a context switch is still
> required because:
> 
> - Doing a switch to the same context is harmless (besides wasting a
>   bit of energy).
> 
> - This can only happen with the default context. But since that one's
>   pinned we'll never call down into evict_everything under normal
>   circumstances. Note that there's a little driver bringup fun
>   involved namely that we could recourse into do_switch for the
>   initial switch. Atm we're fine since we assign the context pointer
>   only after the call to do_switch at driver load or resume time. And
>   in the gpu reset case we skip the entire setup sequence (which might
>   be a bug on its own, but definitely not this one here).
> 
> Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> not just on artificial stress tests), see the reference.
> 
> Note that in upstream code doesn't calle evict_everything directly from
> evict_something, that's an extension in this product branch. But we can still
> hit upon this bug (and apparently we do, see the linked backtraces). I've
> noticed this while trying to construct a testcase for this bug and utterly failed
> to provoke it. It looks like we need to driver the system squarly into the
> lowmem wall and provoke the shrinker to evict the context object by doing
> the last-ditch evict_everything call.
> 
> Aside: There's currently no means to get a badly-fragmenting hw context
> object away from a bad spot in the upstream code. We should fix this by at
> least adding some code to evict_something to handle hw contexts.
> 
> References: https://code.google.com/p/chromium/issues/detail?id=248191
> Reported-by: Ian Lister <ian.lister at intel.com>
> Cc: Ian Lister <ian.lister at intel.com>
> Cc: stable at vger.kernel.org
> Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> Cc: Stéphane Marchesin <marcheu at chromium.org>
> Cc: Bloomfield, Jon <jon.bloomfield at intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>

Reviewed-by: Ian Lister <ian.lister at intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 41877045a1a0..2d2877493f61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
>  	if (ret)
>  		return ret;
> 
> -	/* Clear this page out of any CPU caches for coherent swap-in/out.
> Note
> +	/*
> +	 * Pin can switch back to the default context if we end up calling into
> +	 * evict_everything - as a last ditch gtt defrag effort that also
> +	 * switches to the default context. Hence we need to reload from
> here.
> +	 */
> +	from = ring->last_context;
> +
> +	/*
> +	 * Clear this page out of any CPU caches for coherent swap-in/out.
> +Note
>  	 * that thanks to write = false in this call and us not setting any gpu
>  	 * write domains when putting a context object onto the active list
>  	 * (when switching away from it), this won't block.
> -	 * XXX: We need a real interface to do this instead of trickery. */
> +	 *
> +	 * XXX: We need a real interface to do this instead of trickery.
> +	 */
>  	ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
>  	if (ret) {
>  		i915_gem_object_unpin(to->obj);
> --
> 1.8.4.3



More information about the Intel-gfx mailing list