[Intel-gfx] [PATCH 10/18] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START

Chris Wilson chris at chris-wilson.co.uk
Wed Jul 27 15:19:16 UTC 2016


On Wed, Jul 27, 2016 at 04:04:44PM +0100, Dave Gordon wrote:
> On 21/07/16 15:14, Chris Wilson wrote:
> >On Thu, Jul 21, 2016 at 04:39:58PM +0300, Joonas Lahtinen wrote:
> >>On ke, 2016-07-20 at 14:12 +0100, Chris Wilson wrote:
> >>> 	if (so.aux_batch_size > 8) {
> >>>-		ret = req->engine->dispatch_execbuffer(req,
> >>>-						     (so.ggtt_offset +
> >>>-						      so.aux_batch_offset),
> >>>-						     so.aux_batch_size,
> >>>-						     I915_DISPATCH_SECURE);
> >>>+		ret = req->engine->emit_bb_start(req,
> >>>+						 (so.ggtt_offset +
> >>>+						  so.aux_batch_offset),
> >>>+						 so.aux_batch_size,
> >>>+						 I915_DISPATCH_SECURE);
> >>> 		if (ret)
> >>> 			goto out;
> >>> 	}
> >>
> >>The code above this line is exact reason why I don't like the a->b->c
> >>(especially when there is repetition). But it's not new to this patch
> >>so guess it'll do. Some future work to shorten down a little bit might
> >>not hurt.
> >
> >I presume you mean req->engine->x here, not so.y. Is it just the depth
> >and saving 5 columns? Or something else?
> >-Chris
> 
> It can also hurt code efficiency. For example in
> 
> int i915_gem_render_state_init(struct drm_i915_gem_request *req)
> {
>     ret = i915_gem_render_state_prepare(req->engine, &so);
>     ...
>     ret = req->engine->dispatch_execbuffer(req, so.ggtt_offset,
>                 so.rodata->batch_items * 4, I915_DISPATCH_SECURE);
>     ...
>     if (so.aux_batch_size > 8) {
>         ret = req->engine->dispatch_execbuffer(req,
>                 (so.ggtt_offset + so.aux_batch_offset),
>                 so.aux_batch_size, I915_DISPATCH_SECURE);
>                 if (ret)
>                         goto out;
>         }
>     ...
> }
> 
> The compiler may not -- and in this case, *is not allowed to* --
> assume that req->engine is the same at each callsite. We have passed
> the non-const pointer "req" to callees, so it must assume that
> req->engine may have been changed and re-fetch it from memory.
> 
> By inspection of the generated code, we find that a local for a
> value that is used only once is a net loss, twice is breakeven, and
> three or more times is a definite win.
> 
> And it generally makes the code prettier too :)

Only prettiness really matters here, this emission can be shared for all
contexts with the cache only released under memory pressure. For when we
start caring about the regressions in the GL microbenchmarks...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list