[Intel-gfx] [PATCH] drm/i915: tidy up request alloc
Dave Gordon
david.s.gordon at intel.com
Mon Jul 4 10:36:03 UTC 2016
On 04/07/16 05:08, Liu, Hong wrote:
> On Fri, 2016-07-01 at 19:34 +0100, Chris Wilson wrote:
>> On Fri, Jul 01, 2016 at 05:58:18PM +0100, Dave Gordon wrote:
>>> On 30/06/16 13:49, Tvrtko Ursulin wrote:
>>>>
>>>> On 30/06/16 11:22, Chris Wilson wrote:
>>>>> On Thu, Jun 30, 2016 at 09:50:20AM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 30/06/16 02:35, Hong Liu wrote:
>>>>>>> Return the allocated request pointer directly to remove
>>>>>>> the double pointer parameter.
>>>>>>>
>>>>>>> Signed-off-by: Hong Liu <hong.liu at intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/i915/i915_gem.c | 25 +++++++------------
>>>>>>> ------
>>>>>>> 1 file changed, 7 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> index 1d98782..9881455 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> @@ -2988,32 +2988,26 @@ void i915_gem_request_free(struct
>>>>>>> kref
>>>>>>> *req_ref)
>>>>>>> kmem_cache_free(req->i915->requests, req);
>>>>>>> }
>>>>>>>
>>>>>>> -static inline int
>>>>>>> +static inline struct drm_i915_gem_request *
>>>>>>> __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>>>> - struct i915_gem_context *ctx,
>>>>>>> - struct drm_i915_gem_request **req_out)
>>>>>>> + struct i915_gem_context *ctx)
>>>>>>> {
>>>>>>> struct drm_i915_private *dev_priv = engine->i915;
>>>>>>> unsigned reset_counter =
>>>>>>> i915_reset_counter(&dev_priv->gpu_error);
>>>>>>> struct drm_i915_gem_request *req;
>>>>>>> int ret;
>>>>>>>
>>>>>>> - if (!req_out)
>>>>>>> - return -EINVAL;
>>>>>>> -
>>>>>>> - *req_out = NULL;
>>>>>>> -
>>>>>>> /* ABI: Before userspace accesses the GPU (e.g.
>>>>>>> execbuffer),
>>>>>>> report
>>>>>>> * EIO if the GPU is already wedged, or EAGAIN to drop
>>>>>>> the
>>>>>>> struct_mutex
>>>>>>> * and restart.
>>>>>>> */
>>>>>>> ret = i915_gem_check_wedge(reset_counter,
>>>>>>> dev_priv->mm.interruptible);
>>>>>>> if (ret)
>>>>>>> - return ret;
>>>>>>> + return ERR_PTR(ret);
>>>>>>>
>>>>>>> req = kmem_cache_zalloc(dev_priv->requests,
>>>>>>> GFP_KERNEL);
>>>>>>> if (req == NULL)
>>>>>>> - return -ENOMEM;
>>>>>>> + return ERR_PTR(-ENOMEM);
>>>>>>>
>>>>>>> ret = i915_gem_get_seqno(engine->i915, &req->seqno);
>>>>>>> if (ret)
>>>>>>> @@ -3041,14 +3035,13 @@ __i915_gem_request_alloc(struct
>>>>>>> intel_engine_cs *engine,
>>>>>>> if (ret)
>>>>>>> goto err_ctx;
>>>>>>>
>>>>>>> - *req_out = req;
>>>>>>> - return 0;
>>>>>>> + return req;
>>>>>>>
>>>>>>> err_ctx:
>>>>>>> i915_gem_context_unreference(ctx);
>>>>>>> err:
>>>>>>> kmem_cache_free(dev_priv->requests, req);
>>>>>>> - return ret;
>>>>>>> + return ERR_PTR(ret);
>>>>>>> }
>>>>>>>
>>>>>>> /**
>>>>>>> @@ -3067,13 +3060,9 @@ struct drm_i915_gem_request *
>>>>>>> i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>>>> struct i915_gem_context *ctx)
>>>>>>> {
>>>>>>> - struct drm_i915_gem_request *req;
>>>>>>> - int err;
>>>>>>> -
>>>>>>> if (ctx == NULL)
>>>>>>> ctx = engine->i915->kernel_context;
>>>>>>> - err = __i915_gem_request_alloc(engine, ctx, &req);
>>>>>>> - return err ? ERR_PTR(err) : req;
>>>>>>> + return __i915_gem_request_alloc(engine, ctx);
>>>>>>> }
>>>>>>>
>>>>>>> struct drm_i915_gem_request *
>>>>>>
>>>>>> Looks good to me. And have this feeling I've seen this
>>>>>> somewhere before.
>>>>>
>>>>> Several times. This is not the full tidy, nor does it realise
>>>>> the ramifactions of request alloc through the stack.
>>>>
>>>> Hm I can't spot that it is doing anything wrong or making
>>>> anything worse. You don't want to let the small cleanup in?
>>>>
>>>> Regards,
>>>> Tvrtko
>>>
>>> It ought to make almost no difference, because the *only* place the
>>> inner function is called is from the outer one, which passes a
>>> pointer to a local for the returned object; and the inner one is
>>> then inlined, so the compiler doesn't actually put it on the stack
>>> and call to the inner allocator anyway.
>>>
>>> Strangely, however, with this change the code becomes ~400 bytes
>>> bigger!
>>>
>>> Disassembly reveals that while the code for the externally-callable
>>> outer function is indeed almost identical, a second copy of it has
>>> also been inlined at the one callsite in this file:
>>>
>>> __i915_gem_object_sync() ...
>>> req = i915_gem_request_alloc(to, NULL);
>>>
>>> I don't think that's a critical path and would rather have 400
>>> bytes
>>> smaller codespace. We can get that back by adding /noinline/ to the
>>> outer function i915_gem_request_alloc() (not, of course, to the
>>> inner one, that definitely *should* be inline).
>>
>> __i915_gem_object_sync() should not be calling
>> i915_gem_request_alloc().
>>
>> That's the issue with this patch, your patch and John's patch.
>
> So we wrote the i915_gem_request_alloc() this way is to avoid being
> inlined into callers like __i915_gem_object_sync()?
Not specifically, as the description of commit 268270883 says,
"... this patch renames the existing i915_gem_request_alloc(), and
makes it local (static inline), and replaces it with a wrapper that
provides a default if the context is NULL, and also has a nicer
calling convention (doesn't require a pointer to an output parameter).
Then we change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ..."
At the time, it was more a matter of simplifying review and minimising
lines-of-change by adding the wrapper to implement the new convention
rather than rewriting the real function at the same time as changing all
the callers. So I have no objection to continuing the process of making
this nicer by now flattening the two layers into a single function; I
just noted the oddity that the flattening made the codespace larger.
Chris, do you have a patch for *not* calling i915_gem_request_alloc()
from inside __i915_gem_object_sync() ? I don't really know why John's
original conversion to passing requests everywhere needed this.
.Dave.
> I checked the file with GCC 4.8.5 on my centos environment, it is like
> what Dave found. With the patch, i915_gem_object_sync() is 368 bytes
> bigger.
>
> But when I checked it with GCC 6.1.1 on Fedora 24, it seems it inlines
> the i915_gem_request_alloc() even with the current implementation.
> With the patch, the i915_gem_object_sync() is 80 bytes smaller.
>
>> -Chris
More information about the Intel-gfx
mailing list