[Intel-gfx] [PATCH 04/22] drm/i915: Remove request retirement before each batch

Thu Jul 28 12:24:47 UTC 2016

On Thu, Jul 28, 2016 at 01:52:49PM +0200, Daniel Vetter wrote:
> On Thu, Jul 28, 2016 at 11:26:48AM +0100, Chris Wilson wrote:
> > On Thu, Jul 28, 2016 at 11:54:07AM +0200, Daniel Vetter wrote:
> > > On Wed, Jul 27, 2016 at 12:14:42PM +0100, Chris Wilson wrote:
> > > > This reimplements the denial-of-service protection against igt from
> > > > commit 227f782e4667 ("drm/i915: Retire requests before creating a new
> > > > one") and transfers the stall from before each batch into get_pages().
> > > > The issue is that the stall is increasing latency between batches which
> > > > is detrimental in some cases (especially coupled with execlists) to
> > > > keeping the GPU well fed. Also we have made the observation that retiring
> > > > requests can of itself free objects (and requests) and therefore makes
> > > > a good first step when shrinking.
> > > > 
> > > > v2: Recycle objects prior to i915_gem_object_get_pages()
> > > > v3: Remove the reference to the ring from i915_gem_requests_ring() as it
> > > > operates on an intel_engine_cs.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h            | 1 -
> > > >  drivers/gpu/drm/i915/i915_gem.c            | 7 +++++--
> > > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 --
> > > >  drivers/gpu/drm/i915/i915_gem_request.c    | 4 ++--
> > > >  4 files changed, 7 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index fbda38f25c6b..2de3d16f7b80 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -3169,7 +3169,6 @@ struct drm_i915_gem_request *
> > > >  i915_gem_find_active_request(struct intel_engine_cs *engine);
> > > >  
> > > >  void i915_gem_retire_requests(struct drm_i915_private *dev_priv);
> > > > -void i915_gem_retire_requests_ring(struct intel_engine_cs *engine);
> > > >  
> > > >  static inline u32 i915_reset_counter(struct i915_gpu_error *error)
> > > >  {
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index bf652dc88024..68dbe4f7940c 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2244,7 +2244,6 @@ int
> > > >  i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
> > > >  {
> > > >  	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> > > > -	const struct drm_i915_gem_object_ops *ops = obj->ops;
> > > >  	int ret;
> > > >  
> > > >  	if (obj->pages)
> > > > @@ -2257,7 +2256,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
> > > >  
> > > >  	BUG_ON(obj->pages_pin_count);
> > > >  
> > > > -	ret = ops->get_pages(obj);
> > > > +	/* Recycle as many active objects as possible first */
> > > > +	i915_gem_retire_requests(dev_priv);
> > > > +
> > > > +	ret = obj->ops->get_pages(obj);
> > > 
> > > Why exactly do we need this?
> > > - shmem objs already call shrink_all if they can't get at the memory
> > > - everyone else doesn't care.
> > 
> > Because that is very expensive and we have very poor utilisation of
> > caches. On average, the affected benchmarks are about 100x slower
> > without it and demonstrate large variation.
> > 
> > Everyone else isn't allocating or has their own defense.
> > 
> > Otoh, the more aggressive shrinking is quite recent, more recent than
> > this patch. But I stand by the measurements as they were made that this
> > is the point at which utilisation mattered, if only to worry about it
> > later when I need to remove the call.
> 
> Please add those numbers to the commit message, I think without them this
> particular change isn't well-justified enough.

The oom defense itself?

> > > Even if we need this in some case it looks funny, since it splits the
> > > memory cleanup between caller and callee of get_pages.
> 
> At least for shmem we now have the retire_requests() outside of the
> get_pages vfunc, but the full shrink fallback inside. For OCD reasons I
> htink it'd look better to have both inside the callback. Assuming that we
> need the retire_requests for shmem objects, and not really for any of the
> others.

No, that is much worse from the design standpoint imo. It is the common
point where we know we are under memory pressure (because we are paging
in) and we know that the request list is a good source of reusable
pages.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre