[Mesa-dev] [PATCH 00/14] radeonsi: Offchip tessellation

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Tue May 10 10:52:51 UTC 2016

This patchset implements offchip tessellation after which we can finally process
more than one patch per wave without decreasing tessmark scores.

For tessmark this improves performance by ~20% for the x32 case and ~80% for the
x64 case. x8 and x16 have roughly the same performance as before. Unigine heaven
gets 43 fps compared to 28 before (roughly +50%). Amdgpu-pro gets 44 fps for
heaven. For Shadow of Mordor the performance changes from 28 fps to 40 fps
(roughly +40%).

Remaining ideas for improvement are:

  - Don't store TCS outputs to TCS and don't unnecessarily allocate LDS. This
    has pretty much no measurable effect in the games I tried.

  - Only store TCS outputs to memory when the tess factors exceed a threshold. I
    haven't been able to get the LDS case working with dynamic HS enabled, but
    the decompiled amdgpu-pro shaders give a very strong hint that this is
    possible. However amdgpu-pro sets the thresshold to -1, so pretty much always
    stores to memory too as far as I can see. Maybe it does not work on VI,
    or there is some interaction with the VI only distribution modes and these
    were considered more profitable.

  - Hardware swizzled buffers. The swizzling by hand I use results in extra VALU
    instructions and it would be nice if we did not need to have them. However,
    my attempts have not resulted in a performance improvement yet.

I have run the piglit gpu suite and found no regressions on a Tonga card.

Bas Nieuwenhuizen (14):
  radeonsi: Add buffer for offchip storage between TCS and TES.
  radeonsi: Add offchip tessellation parameters.
  radeonsi: Define build_tbuffer_store_dwords earlier to support new
  radeonsi: Add buffer load functions.
  radeonsi: Use correct parameter index for LS_OUT_LAYOUT.
  radeonsi: Add user SGPR for the layout of the offchip buffer.
  radeonsi: Add offchip buffer address calculation.
  radeonsi: Store inputs to memory when not using a TCS.
  radeonsi: Use buffer loads and stores for passing data from TCS to
  radeonsi: Remove LDS layout user SGPR's from TES.
  radeonsi: Enable dynamic HS.
  radeonsi: Use barrier instructions for TCS barriers.
  radeonsi: Process multiple patches per threadgroup.
  radeonsi: Allow TES distribution between shader engines.

 src/gallium/drivers/radeonsi/si_pipe.c          |   1 +
 src/gallium/drivers/radeonsi/si_pipe.h          |   1 +
 src/gallium/drivers/radeonsi/si_shader.c        | 567 ++++++++++++++++++------
 src/gallium/drivers/radeonsi/si_shader.h        |  32 +-
 src/gallium/drivers/radeonsi/si_state.c         |   5 +
 src/gallium/drivers/radeonsi/si_state.h         |   1 +
 src/gallium/drivers/radeonsi/si_state_draw.c    |  59 ++-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  67 ++-
 8 files changed, 560 insertions(+), 173 deletions(-)


More information about the mesa-dev mailing list