[Intel-gfx] [PATCH v2] drm/i915: Execlist irq handler micro optimisations

Chris Wilson chris at chris-wilson.co.uk
Fri Feb 12 14:42:52 UTC 2016


On Fri, Feb 12, 2016 at 12:00:40PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Assorted changes most likely without any practical effect
> apart from a tiny reduction in generated code for the interrupt
> handler and request submission.
> 
>  * Remove needless initialization.
>  * Improve cache locality by reorganizing code and/or using
>    branch hints to keep unexpected or error conditions out
>    of line.
>  * Favor busy submit path vs. empty queue.
>  * Less branching in hot-paths.
> 
> v2:
> 
>  * Avoid mmio reads when possible. (Chris Wilson)
>  * Use natural integer size for csb indices.
>  * Remove useless return value from execlists_update_context.
>  * Extract 32-bit ppgtt PDPs update so it is out of line and
>    shared with two callers.
>  * Grab forcewake across all mmio operations to ease the
>    load on uncore lock and use chepear mmio ops.
> 
> Version 2 now makes the irq handling code path ~20% smaller on
> 48-bit PPGTT hardware, and a little bit less elsewhere. Hot
> paths are mostly in-line now and hammering on the uncore
> spinlock is greatly reduced together with mmio traffic to an
> extent.

Did you notice that ring->next_context_status_buffer is redundant as we
also have that information to hand in status_pointer?

What's your thinking for

	if (req->elsp_submitted & ring->gen8_9)

vs a plain

	if (req->elsp_submitted)
?

The tidies look good. Be useful to double check whether gem_latency is
behaving as a canary, it's a bit of a puzzle why that first dispatch
latency would grow.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list