[Intel-gfx] [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation

Mon Aug 8 09:56:56 UTC 2016

On Mon, Aug 08, 2016 at 11:25:56AM +0200, Daniel Vetter wrote:
> On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> > When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> > Enable lockless lookup of request tracking via RCU"), we acknowledge that
> > we may race with another thread that could have reallocated the request.
> > In order for the first thread not to blow up, the second thread must not
> > clear the request completed before overwriting it. In the RCU lookup, we
> > allow for the engine/seqno to be replaced but we do not allow for it to
> > be zeroed.
> > 
> > The choice we make is to either add extra checking to the RCU lookup, or
> > embrace the inherent races (as intended). It is more complicated as we
> > need to manually clear everything we depend upon being zero initialised,
> > but we benefit from not emiting the memset() to clear the entire
> > frequently allocated structure (that memset turns up in throughput
> > profiles). And at the same time, the lookup remains flexible for future
> > adjustments.
> > 
> > v2: Old style LRC requires another variable to be initialize. (The
> > danger inherent in not zeroing everything.)
> > v3: request->batch also needs to be cleared
> > 
> > Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: "Goel, Akash" <akash.goel at intel.com>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
> >  2 files changed, 47 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index 6a1661643d3d..b7ffde002a62 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> >  	if (req && i915_gem_request_completed(req))
> >  		i915_gem_request_retire(req);
> >  
> > -	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> > +	/* Beware: Dragons be flying overhead.
> > +	 *
> > +	 * We use RCU to look up requests in flight. The lookups may
> > +	 * race with the request being allocated from the slab freelist.
> > +	 * That is the request we are writing to here, may be in the process
> > +	 * of being read by __i915_gem_active_get_request_rcu(). As such,
> > +	 * we have to be very careful when overwriting the contents. During
> > +	 * the RCU lookup, we change chase the request->engine pointer,
> > +	 * read the request->fence.seqno and increment the reference count.
> > +	 *
> > +	 * The reference count is incremented atomically. If it is zero,
> > +	 * the lookup knows the request is unallocated and complete. Otherwise,
> > +	 * it is either still in use, or has been reallocated and reset
> > +	 * with fence_init(). This increment is safe for release as we check
> > +	 * that the request we have a reference to and matches the active
> > +	 * request.
> > +	 *
> > +	 * Before we increment the refcount, we chase the request->engine
> > +	 * pointer. We must not call kmem_cache_zalloc() or else we set
> > +	 * that pointer to NULL and cause a crash during the lookup. If
> > +	 * we see the request is completed (based on the value of the
> > +	 * old engine and seqno), the lookup is complete and reports NULL.
> > +	 * If we decide the request is not completed (new engine or seqno),
> > +	 * then we grab a reference and double check that it is still the
> > +	 * active request - which it won't be and restart the lookup.
> > +	 *
> > +	 * Do not use kmem_cache_zalloc() here!
> > +	 */
> > +	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
> >  	if (!req)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> >  	req->engine = engine;
> >  	req->ctx = i915_gem_context_get(ctx);
> 
> See my earlier review - if we go with this I think we should fully embrace
> it and not clear anything where it's not needed. Otherwise we have a funny
> mix of defensive clearing to NULL and needing to be careful.
>   
> > +	/* No zalloc, must clear what we need by hand */
> > +	req->signaling.wait.tsk = NULL;
> 
> This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
> WARN_ON instead?

This is just from older code where we had the if (wait.tsk != NULL)
skip.

> > +	req->previous_context = NULL;
> 
> We unconditionally set this in advance_context (together with a bunch of
> other ring state tracked in the request). Do we really need to reset this
> here?

Previous_context may be used unset (along a failure path), so requires
initialising.

> > +	req->file_priv = NULL;
> 
> This is already cleared in either request_retire or _release. Again maybe
> just a WARN_ON?.

But we never clear it first, so it may be poisoned.

> > +	req->batch_obj = NULL;
> 
> Agreed with this one, we might reuse the request for a non-execbuf
> request. But I think we also need to reset ->pid here.

What pid? Gah. (Don't have pid here in my tree...)

> > +	req->elsp_submitted = 0;
> 
> Needed, but feels misplaced since it's lrc stuff. I think it'd be better
> to stuff this into intel_logical_ring_alloc_request_extras.

No need for that extra complexity, it is to be removed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre