[Intel-gfx] [PATCH v3] drm/i915: Move CSB MMIO reads out of the execlists lock
Chris Wilson
chris at chris-wilson.co.uk
Thu Mar 17 13:14:41 UTC 2016
On Thu, Mar 17, 2016 at 12:59:46PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> By reading the CSB (slow MMIO accesses) into a temporary local
> buffer we can decrease the duration of holding the execlist
> lock.
>
> Main advantage is that during heavy batch buffer submission we
> reduce the execlist lock contention, which should decrease the
> latency and CPU usage between the submitting userspace process
> and interrupt handling.
>
> Downside is that we need to grab and relase the forcewake twice,
> but as the below numbers will show this is completely hidden
> by the primary gains.
>
> Testing with "gem_latency -n 100" (submit batch buffers with a
> hundred nops each) shows more than doubling of the throughput
> and more than halving of the dispatch latency, overall latency
> and CPU time spend in the submitting process.
>
> Submitting empty batches ("gem_latency -n 0") does not seem
> significantly affected by this change with throughput and CPU
> time improving by half a percent, and overall latency worsening
> by the same amount.
>
> Above tests were done in a hundred runs on a big core Broadwell.
>
> v2:
> * Overflow protection to local CSB buffer.
> * Use closer dev_priv in execlists_submit_requests. (Chris Wilson)
>
> v3: Rebase.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 77 ++++++++++++++++++++--------------------
> 1 file changed, 38 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f72782200226..c6b3a9d19af2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -416,15 +416,23 @@ static void execlists_update_context(struct drm_i915_gem_request *rq)
> static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
> struct drm_i915_gem_request *rq1)
> {
> + struct drm_i915_private *dev_priv = rq0->i915;
> +
> execlists_update_context(rq0);
>
> if (rq1)
> execlists_update_context(rq1);
>
> + spin_lock(&dev_priv->uncore.lock);
> + intel_uncore_forcewake_get__locked(dev_priv, FORCEWAKE_ALL);
My only problem with this is that it puts the onus on the caller to
remember to disable irqs first. (That would be a silent conflict with
the patch to move the submission to a kthread for example.)
What's the cost of being upfront with spin_lock_irqsave() here?
/* BUG_ON(!irqs_disabled()); */ ?
I'm quite happy to go with the comment here and since it doesn't make
the lockups any worse,
Reviewed-by: Chris Wilsno <chris at chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list