[Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch
Lister, Ian
ian.lister at intel.com
Fri Dec 6 11:05:51 CET 2013
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch]
> Sent: Thursday, December 05, 2013 2:43 PM
> To: Intel Graphics Development
> Cc: Daniel Vetter; Lister, Ian; stable at vger.kernel.org; Widawsky, Benjamin;
> Stéphane Marchesin; Bloomfield, Jon
> Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
>
> So apparently under ridiculous amounts of memory pressure we can get into
> trouble in do_switch when we try to move the old hw context backing
> storage object onto the active lists.
>
> With list debugging enabled that usually results in us chasing a poisoned
> pointer - which means we've hit upon a vma that has been removed from all
> lrus with list_del (and then deallocated, so it's a real use-after free).
>
> Ian Lister has done some great callchain chasing and noticed that we can
> reenter do_switch:
>
> i915_gem_do_execbuffer()
>
> i915_switch_context()
>
> do_switch()
> from = ring->last_context;
> i915_gem_object_pin()
>
> i915_gem_object_bind_to_gtt()
> ret = drm_mm_insert_node_in_range_generic();
> // If the above call fails then it will try i915_gem_evict_something()
> // If that fails it will call i915_gem_evict_everything() ...
> i915_gem_evict_everything()
> i915_gpu_idle()
> i915_switch_context(DEFAULT_CONTEXT)
>
> Like with everything else where the shrinker or eviction code can invalidate
> pointers we need to reload relevant state.
>
> Note that there's no need to recheck whether a context switch is still
> required because:
>
> - Doing a switch to the same context is harmless (besides wasting a
> bit of energy).
>
> - This can only happen with the default context. But since that one's
> pinned we'll never call down into evict_everything under normal
> circumstances. Note that there's a little driver bringup fun
> involved namely that we could recourse into do_switch for the
> initial switch. Atm we're fine since we assign the context pointer
> only after the call to do_switch at driver load or resume time. And
> in the gpu reset case we skip the entire setup sequence (which might
> be a bug on its own, but definitely not this one here).
>
> Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> not just on artificial stress tests), see the reference.
>
> Note that in upstream code doesn't calle evict_everything directly from
> evict_something, that's an extension in this product branch. But we can still
> hit upon this bug (and apparently we do, see the linked backtraces). I've
> noticed this while trying to construct a testcase for this bug and utterly failed
> to provoke it. It looks like we need to driver the system squarly into the
> lowmem wall and provoke the shrinker to evict the context object by doing
> the last-ditch evict_everything call.
>
> Aside: There's currently no means to get a badly-fragmenting hw context
> object away from a bad spot in the upstream code. We should fix this by at
> least adding some code to evict_something to handle hw contexts.
>
> References: https://code.google.com/p/chromium/issues/detail?id=248191
> Reported-by: Ian Lister <ian.lister at intel.com>
> Cc: Ian Lister <ian.lister at intel.com>
> Cc: stable at vger.kernel.org
> Cc: Ben Widawsky <benjamin.widawsky at intel.com>
> Cc: Stéphane Marchesin <marcheu at chromium.org>
> Cc: Bloomfield, Jon <jon.bloomfield at intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
Reviewed-by: Ian Lister <ian.lister at intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 41877045a1a0..2d2877493f61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
> if (ret)
> return ret;
>
> - /* Clear this page out of any CPU caches for coherent swap-in/out.
> Note
> + /*
> + * Pin can switch back to the default context if we end up calling into
> + * evict_everything - as a last ditch gtt defrag effort that also
> + * switches to the default context. Hence we need to reload from
> here.
> + */
> + from = ring->last_context;
> +
> + /*
> + * Clear this page out of any CPU caches for coherent swap-in/out.
> +Note
> * that thanks to write = false in this call and us not setting any gpu
> * write domains when putting a context object onto the active list
> * (when switching away from it), this won't block.
> - * XXX: We need a real interface to do this instead of trickery. */
> + *
> + * XXX: We need a real interface to do this instead of trickery.
> + */
> ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
> if (ret) {
> i915_gem_object_unpin(to->obj);
> --
> 1.8.4.3
More information about the Intel-gfx
mailing list