[Intel-gfx] [PATCH v4] drm/i915: Execlists small cleanups and micro-optimisations

Chris Wilson chris at chris-wilson.co.uk
Fri Feb 26 20:24:48 UTC 2016


On Fri, Feb 26, 2016 at 04:58:32PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> 
> Assorted changes in the areas of code cleanup, reduction of
> invariant conditional in the interrupt handler and lock
> contention and MMIO access optimisation.
> 
>  * Remove needless initialization.
>  * Improve cache locality by reorganizing code and/or using
>    branch hints to keep unexpected or error conditions out
>    of line.
>  * Favor busy submit path vs. empty queue.
>  * Less branching in hot-paths.
> 
> v2:
> 
>  * Avoid mmio reads when possible. (Chris Wilson)
>  * Use natural integer size for csb indices.
>  * Remove useless return value from execlists_update_context.
>  * Extract 32-bit ppgtt PDPs update so it is out of line and
>    shared with two callers.
>  * Grab forcewake across all mmio operations to ease the
>    load on uncore lock and use chepear mmio ops.
> 
> v3:
> 
>  * Removed some more pointless u8 data types.
>  * Removed unused return from execlists_context_queue.
>  * Commit message updates.
> 
> v4:
>  * Unclumsify the unqueue if statement. (Chris Wilson)
>  * Hide forcewake from the queuing function. (Chris Wilson)
> 
> Version 3 now makes the irq handling code path ~20% smaller on
> 48-bit PPGTT hardware, and a little bit less elsewhere. Hot
> paths are mostly in-line now and hammering on the uncore
> spinlock is greatly reduced together with mmio traffic to an
> extent.
> 
> Benchmarking with "gem_latency -n 100" (keep submitting
> batches with 100 nop instruction) shows approximately 4% higher
> throughput, 2% less CPU time and 22% smaller latencies. This was
> on a big-core while small-cores could benefit even more.

Just add a quick comment about "gem_latency -n 0" suggesting an oddity
with synchronous workloads that bears further study (just so that we
have the hint/reminder about the test case to run).
 
> Most likely reason for the improvements are the MMIO
> optimization and uncore lock traffic reduction.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list