[Intel-gfx] [PATCH 2/2] drm/i915/lrc: Skip no-op per-bb buffer on gen9

Thu Sep 21 15:41:39 UTC 2017

Quoting Tvrtko Ursulin (2017-09-21 16:12:21)
> 
> On 21/09/2017 14:54, Chris Wilson wrote:
> > Since we inherited the context image setup from gen8 which needed a
> > per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer
> > on gen9. Now that we can skip adding the buffer to the context image,
> > remove the dangling per-bb. This slightly improves execution latency,
> > most notably on an idle engine.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
> 
> How much of the 7% we get back? :)

Not enough. The difference in execution latency between ringbuffer
submission and execlists for this type of workload is roughly an order of
magnitude (~5us to ~30us, using gem_sync as a reasonable proxy). The
per-bb accounts for around 6us of that on bdw, so a big chunk but still
a few times slower. Not that we do move the GPGPU workaround on bdw
just yet, I left that for when we do play with preemption and
MI_ARB_ON_OFF. (Side note, the remaining difference between ringbuffer
and execlists seems to be related to MI arbitration...)
-Chris