[Mesa-dev] [RFC] Enable Resource Streamer on Haswell

Abdiel Janulgue abdiel.janulgue at linux.intel.com
Mon Jul 8 06:16:51 PDT 2013

The following RFC patchset initially enables the resource streamer on 

We can think of the resource streamer as a command streamer accelerator: 
It accelerates certain commands that would normally take time to build-up 
and submit to the GPU; hence reducing some of the overhead associated with 
such commands. In Haswell, generating binding tables and constant buffers 
can be offloaded from being CPU-generated commands to the resource streamer.

This is a preparatory patchset that initially enables hardware-generated 
binding tables - which is primarily required to enable RS-based 
optimizations e.g.constant buffer generation and other ways to reduce 
command buffer submissions. This initial patch is closely modeled after
the current model of how the i965 driver generates binding tables (see section
below for possible future optimization). Though it shaved off a few 
microseconds off CPU cycles for every command submission, I don't expect
it at its current form to produce wide margins in performance gains. 
The changes improved GLB 2.5 by 0.19% n=14.

In hw-generated binding tables case, the RS basically sits in front 
of the CS watching for the [VS/PS]BINDING_TABLE_POINTERS commands. Once 
RS encounters it, it flushes the state of the on-die binding table entries 
to a buffer object, where the CS picks it up afterwards. Each surface state
and it's associated index in the on-die binding table state can be edited 
directly instead of generating the entire binding table array in one go.

One optimization idea that we can possibly implement in the future is to 
use the RS to publish deltas of changed surface states so that we 
wouldn't have to rebuild entire binding tables for every batch buffer 
flush. Currently our VS/PS surface states are appended at the end of our 
batchbuffer in the i965 driver. For every batchbuffer flush, the VS/PS 
surface states and binding tables are rebuilt everytime for every change. 
With the RS in mind, it would be possible to use a separate larger 
batchbuffer for (permanent?) surface state objects so the generated 
surface state offsets would change less often [1]. 

With this series, GLB works fine and most piglit tests pass but some
random GPU lockups may occur when piglit is run over a period of time. 
intel_error_decode does not specifically say where in the batch the problem 
points to. I'll spend some time in nailing down this issue in the
next revision.

In the intel-gfx list, I'll post the libdrm and kernel portions that enables 

[1] Needs changes in libdrm aperture checks to accomodate multiple levels of relocation
See http://lists.freedesktop.org/archives/mesa-dev/2013-May/039088.html

Abdiel Janulgue (12):
      intel: Add resource streamer control defines
      intel: On Haswell hardware, enable the resource streamer on batchbuffer start
      i965: Temporarily disable resource streamer when state base address is updated.
      i965: Add MI_RS_STORE_DATA_IMM workaround for 3DPRIMITIVE commands
      i965: Switch on hardware-generated binding tables.
      i965: Implement opcodes for the hw-generated binding table EDIT commands
      i965: Use hw-bt for pull constants and VS UBO surface states.
      i965: Use hw-bt for renderbuffer, constant, and texture surface states.
      i965: Flush on-chip binding table to pool
      i965: Use hw-bt for generated WM UBO surface states.
      i965/blorp: In blorp, update PS on-chip binding table when new surface state entries are generated
      i965/blorp: Add temporary work-around due to b607d57630daa7d92a84c41abfd45cacbe63f3d2

 src/mesa/drivers/dri/i965/brw_context.c           |   2 ++
 src/mesa/drivers/dri/i965/brw_context.h           |   1 +
 src/mesa/drivers/dri/i965/brw_defines.h           |   9 ++++++
 src/mesa/drivers/dri/i965/brw_draw.c              |  14 +++++++++
 src/mesa/drivers/dri/i965/brw_misc_state.c        |   7 +++++
 src/mesa/drivers/dri/i965/brw_state.h             |  13 +++++++++
 src/mesa/drivers/dri/i965/brw_state_upload.c      |   3 ++
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c  |  14 +++++++++
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  |   9 ++++++
 src/mesa/drivers/dri/i965/gen6_blorp.cpp          |  27 ++++++++++++++++-
 src/mesa/drivers/dri/i965/gen7_blorp.cpp          |   3 +-
 src/mesa/drivers/dri/i965/gen7_misc_state.c       | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/mesa/drivers/dri/i965/gen7_vs_state.c         |  10 ++++---
 src/mesa/drivers/dri/i965/gen7_wm_state.c         |  10 ++++---
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  36 +++++++++++++++++++----
 src/mesa/drivers/dri/i965/intel_batchbuffer.c     |   3 ++
 src/mesa/drivers/dri/i965/intel_reg.h             |   4 +++
 17 files changed, 259 insertions(+), 15 deletions(-)

More information about the mesa-dev mailing list