[Intel-gfx] [PATCH] drm/i915: tidy up request alloc

Mon Jul 4 04:08:34 UTC 2016

On Fri, 2016-07-01 at 19:34 +0100, Chris Wilson wrote:
> On Fri, Jul 01, 2016 at 05:58:18PM +0100, Dave Gordon wrote:
> > On 30/06/16 13:49, Tvrtko Ursulin wrote:
> > > 
> > > On 30/06/16 11:22, Chris Wilson wrote:
> > > > On Thu, Jun 30, 2016 at 09:50:20AM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 30/06/16 02:35, Hong Liu wrote:
> > > > > > Return the allocated request pointer directly to remove
> > > > > > the double pointer parameter.
> > > > > > 
> > > > > > Signed-off-by: Hong Liu <hong.liu at intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/i915_gem.c | 25 +++++++------------
> > > > > > ------
> > > > > >  1 file changed, 7 insertions(+), 18 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > > > > > b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > index 1d98782..9881455 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > @@ -2988,32 +2988,26 @@ void i915_gem_request_free(struct
> > > > > > kref
> > > > > > *req_ref)
> > > > > >      kmem_cache_free(req->i915->requests, req);
> > > > > >  }
> > > > > > 
> > > > > > -static inline int
> > > > > > +static inline struct drm_i915_gem_request *
> > > > > >  __i915_gem_request_alloc(struct intel_engine_cs *engine,
> > > > > > -             struct i915_gem_context *ctx,
> > > > > > -             struct drm_i915_gem_request **req_out)
> > > > > > +             struct i915_gem_context *ctx)
> > > > > >  {
> > > > > >      struct drm_i915_private *dev_priv = engine->i915;
> > > > > >      unsigned reset_counter =
> > > > > > i915_reset_counter(&dev_priv->gpu_error);
> > > > > >      struct drm_i915_gem_request *req;
> > > > > >      int ret;
> > > > > > 
> > > > > > -    if (!req_out)
> > > > > > -        return -EINVAL;
> > > > > > -
> > > > > > -    *req_out = NULL;
> > > > > > -
> > > > > >      /* ABI: Before userspace accesses the GPU (e.g.
> > > > > > execbuffer),
> > > > > > report
> > > > > >       * EIO if the GPU is already wedged, or EAGAIN to drop
> > > > > > the
> > > > > > struct_mutex
> > > > > >       * and restart.
> > > > > >       */
> > > > > >      ret = i915_gem_check_wedge(reset_counter,
> > > > > > dev_priv->mm.interruptible);
> > > > > >      if (ret)
> > > > > > -        return ret;
> > > > > > +        return ERR_PTR(ret);
> > > > > > 
> > > > > >      req = kmem_cache_zalloc(dev_priv->requests,
> > > > > > GFP_KERNEL);
> > > > > >      if (req == NULL)
> > > > > > -        return -ENOMEM;
> > > > > > +        return ERR_PTR(-ENOMEM);
> > > > > > 
> > > > > >      ret = i915_gem_get_seqno(engine->i915, &req->seqno);
> > > > > >      if (ret)
> > > > > > @@ -3041,14 +3035,13 @@ __i915_gem_request_alloc(struct
> > > > > > intel_engine_cs *engine,
> > > > > >      if (ret)
> > > > > >          goto err_ctx;
> > > > > > 
> > > > > > -    *req_out = req;
> > > > > > -    return 0;
> > > > > > +    return req;
> > > > > > 
> > > > > >  err_ctx:
> > > > > >      i915_gem_context_unreference(ctx);
> > > > > >  err:
> > > > > >      kmem_cache_free(dev_priv->requests, req);
> > > > > > -    return ret;
> > > > > > +    return ERR_PTR(ret);
> > > > > >  }
> > > > > > 
> > > > > >  /**
> > > > > > @@ -3067,13 +3060,9 @@ struct drm_i915_gem_request *
> > > > > >  i915_gem_request_alloc(struct intel_engine_cs *engine,
> > > > > >                 struct i915_gem_context *ctx)
> > > > > >  {
> > > > > > -    struct drm_i915_gem_request *req;
> > > > > > -    int err;
> > > > > > -
> > > > > >      if (ctx == NULL)
> > > > > >          ctx = engine->i915->kernel_context;
> > > > > > -    err = __i915_gem_request_alloc(engine, ctx, &req);
> > > > > > -    return err ? ERR_PTR(err) : req;
> > > > > > +    return __i915_gem_request_alloc(engine, ctx);
> > > > > >  }
> > > > > > 
> > > > > >  struct drm_i915_gem_request *
> > > > > > 
> > > > > 
> > > > > Looks good to me. And have this feeling I've seen this
> > > > > somewhere before.
> > > > 
> > > > Several times. This is not the full tidy, nor does it realise
> > > > the
> > > > ramifactions of request alloc through the stack.
> > > 
> > > Hm I can't spot that it is doing anything wrong or making
> > > anything
> > > worse. You don't want to let the small cleanup in?
> > > 
> > > Regards,
> > > Tvrtko
> > 
> > It ought to make almost no difference, because the *only* place the
> > inner function is called is from the outer one, which passes a
> > pointer to a local for the returned object; and the inner one is
> > then inlined, so the compiler doesn't actually put it on the stack
> > and call to the inner allocator anyway.
> > 
> > Strangely, however, with this change the code becomes ~400 bytes
> > bigger!
> > 
> > Disassembly reveals that while the code for the externally-callable
> > outer function is indeed almost identical, a second copy of it has
> > also been inlined at the one callsite in this file:
> > 
> > __i915_gem_object_sync() ...
> > 	req = i915_gem_request_alloc(to, NULL);
> > 
> > I don't think that's a critical path and would rather have 400
> > bytes
> > smaller codespace. We can get that back by adding /noinline/ to the
> > outer function i915_gem_request_alloc() (not, of course, to the
> > inner one, that definitely *should* be inline).
> 
> __i915_gem_object_sync() should not be calling
> i915_gem_request_alloc().
> 
> That's the issue with this patch, your patch and John's patch.

So we wrote the i915_gem_request_alloc() this way is to avoid being
inlined into callers like __i915_gem_object_sync()?

I checked the file with GCC 4.8.5 on my centos environment, it is like
what Dave found. With the patch, i915_gem_object_sync() is 368 bytes
bigger.

But when I checked it with GCC 6.1.1 on Fedora 24, it seems it inlines
the i915_gem_request_alloc() even with the current implementation.
With the patch, the i915_gem_object_sync() is 80 bytes smaller.

> -Chris
>