[Intel-gfx] [RFC 0/5] Class/instance based execbuf plus more

Tvrtko Ursulin tursulin at ursulin.net
Mon Nov 13 13:09:04 UTC 2017


From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Now that the engine class concept is in, it is time to re-send the old proposal
of using it for engine selection in execbuf.

Idea is primarily to fix the situation with the current VCS engine selection ABI
by introducing a new, cleaner, method of selecting the VCS engine.

Then there are two new pieces of uAPI proposal, engine capabilities and
concurrent contexts, which for instance enable the VA-API driver to let the i915
balance it's batch buffers dynamically.

This enables better utilization of resources on GT3/GT4 parts where:

 a) a single stream can now use both engines
 b) it opens the door of extending the i915 scheduler with more advanced
    load balancing approaches to support the multiple-streams use cases better.

For instance decoding a single H.264 stream on a GT4 part is now improved from
57 seconds to 40 seconds, with minimal VA-API code base changes:

root at sc:~/ffmpeg# VA_INTEL_CONCURRENT=0 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -

 Performance counter stats for 'system wide':

    57,568,097,358 ns   i915/vcs0-busy/
                 0 ns   i915/vcs1-busy/

      57.585753514 seconds time elapsed

root at sc:~/ffmpeg# VA_INTEL_CONCURRENT=1 perf stat -a -e i915/vcs0-busy/,i915/vcs1-busy/ ffmpeg -loglevel panic -hwaccel vaapi -hwaccel_output_format vaapi -i ~/bbb_sunflower_1080p_60fps_normal.mp4 -f null -

 Performance counter stats for 'system wide':

    29,152,427,164 ns   i915/vcs0-busy/
    29,115,272,714 ns   i915/vcs1-busy/

      40.733992298 seconds time elapsed

I will be sending the proof-of-concept patches for intel-vaapi-driver
separately.

Tvrtko Ursulin (5):
  drm/i915: Select engines via class and instance in execbuffer2
  drm/i915: Engine capabilities uAPI
  drm/i915: Concurrent context uAPI
  drm/i915: Re-arrange execbuf so context is known before engine
  drm/i915: Per batch buffer VCS balancing

 drivers/gpu/drm/i915/i915_drv.h            |   7 +-
 drivers/gpu/drm/i915/i915_gem.c            |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |  14 +++
 drivers/gpu/drm/i915/i915_gem_context.h    |  20 +++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 134 ++++++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_engine_cs.c     |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   2 +
 include/uapi/drm/i915_drm.h                |  34 +++++++-
 8 files changed, 180 insertions(+), 36 deletions(-)

-- 
2.14.1



More information about the Intel-gfx mailing list