[Mesa-dev] i965: On-demand render target flushing

Topi Pohjolainen topi.pohjolainen at gmail.com
Fri Feb 17 19:32:03 UTC 2017


1) Blorp color clears and resolves emit unconditional render target
   flush + command stream after every clear/resolve (including
   regular non-fast clears).

2) Blorp color clears, resolves and blits emit texture and constant
   cache resolves even in case only destination is dirty. This is
   because brw_render_cache_set_check_flush() does both render target
   flush as well as the top-of-pipe read cache flushes.

3) Similarly to item 2, 3D and compute paths also flush texture and
   constant caches even if none of the texture surfaces are dirty.

4) In case of multiple surfaces needing resolves, all render paths
   (blorp, 3D and compute) emit render target, texture and constant
   cache flushes after each resolve instead of just once after all

This series addresses all four cases. Good news are that even though
the current setup isn't optimal, it doesn't actually get any better in
most cases performance wise. There is modest gain in OglDrvRes which
does heavy blorp blitting. I'm expecting this series also to make
blorp tex uploads and blorp mipmap generation more competitive.

Bad news are in the final patch - it looks that current unconditional
flushing/stalling has been hiding bugs elsewhere. There are cases
which rely on the flushes after non-fast clears. Hunting the real
cause is, however, difficult. I only saw them in CI system within
full runs and was not able to reproduce them myself.

As first steps the series introduces end-of-pipe synchronization.
This is a flush combined with stall and post-sync operation of
writing a double word (32 bits). Until now this wasn't really
needed as there was in many cases double flushing which in turn
looks to take long enough to hide the need for the sync. I also
noticed that one needs to be rather careful with it - performance
gets decreased noticeably when used unneeded.

I don't really know if we want to go this way myself even. Current
logic - while not ideal - is rather simple.

Topi Pohjolainen (16):
  i965/miptree: Tell if anything got resolved
  i965/gen6+: Implement end-of-pipe sync
  i965: Hook end-of-pipe-sync after texture resolves
  i965: Hook end-of-pipe-sync after image resolves
  i965: Hook end-of-pipe-sync after framebuffer resolves
  i965: Consider layered rt resolves along with other
  i965: Add color resolve end-of-pipe-sync before switch to blit ring
  i965/dri2: Add end-of-pipe-sync after color resolves
  i965/miptree: Add color resolve end-of-pipe-sync before sharing
  i965: Add end-of-pipe sync before non-gpu read of color resolves
  i965/blorp: Do more fine grained flushing/syncing
  i965/blorp/blit: Refactor hiz/ccs prep for blits
  i965/blorp: Use conditional end-of-pipe-sync
  i965: Consider surface resolves and sync after blorp ops
  i965: Check if fast color clear state transition needs sync
  i965/blorp: Drop unnecessary flushes after clear/resolve

 src/mesa/drivers/dri/i965/brw_blorp.c          | 187 ++++++++++----
 src/mesa/drivers/dri/i965/brw_compute.c        |   2 +
 src/mesa/drivers/dri/i965/brw_context.c        | 333 +++++++++++++++++++------
 src/mesa/drivers/dri/i965/brw_context.h        |   3 +
 src/mesa/drivers/dri/i965/brw_draw.c           |  36 +--
 src/mesa/drivers/dri/i965/brw_pipe_control.c   |  91 +++++++
 src/mesa/drivers/dri/i965/genX_blorp_exec.c    |  11 -
 src/mesa/drivers/dri/i965/intel_blit.c         |  16 +-
 src/mesa/drivers/dri/i965/intel_copy_image.c   |  10 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  |  25 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |   2 +-
 src/mesa/drivers/dri/i965/intel_pixel.c        |   4 +
 src/mesa/drivers/dri/i965/intel_pixel_bitmap.c |   5 +-
 src/mesa/drivers/dri/i965/intel_pixel_read.c   |   7 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c    |  10 +-
 src/mesa/drivers/dri/i965/intel_tex_subimage.c |  11 +-
 16 files changed, 557 insertions(+), 196 deletions(-)


More information about the mesa-dev mailing list