[Mesa-dev] [PATCH 00/14] radeonsi: Offchip tessellation
Bas Nieuwenhuizen
bas at basnieuwenhuizen.nl
Tue May 10 10:52:51 UTC 2016
This patchset implements offchip tessellation after which we can finally process
more than one patch per wave without decreasing tessmark scores.
For tessmark this improves performance by ~20% for the x32 case and ~80% for the
x64 case. x8 and x16 have roughly the same performance as before. Unigine heaven
gets 43 fps compared to 28 before (roughly +50%). Amdgpu-pro gets 44 fps for
heaven. For Shadow of Mordor the performance changes from 28 fps to 40 fps
(roughly +40%).
Remaining ideas for improvement are:
- Don't store TCS outputs to TCS and don't unnecessarily allocate LDS. This
has pretty much no measurable effect in the games I tried.
- Only store TCS outputs to memory when the tess factors exceed a threshold. I
haven't been able to get the LDS case working with dynamic HS enabled, but
the decompiled amdgpu-pro shaders give a very strong hint that this is
possible. However amdgpu-pro sets the thresshold to -1, so pretty much always
stores to memory too as far as I can see. Maybe it does not work on VI,
or there is some interaction with the VI only distribution modes and these
were considered more profitable.
- Hardware swizzled buffers. The swizzling by hand I use results in extra VALU
instructions and it would be nice if we did not need to have them. However,
my attempts have not resulted in a performance improvement yet.
I have run the piglit gpu suite and found no regressions on a Tonga card.
Bas Nieuwenhuizen (14):
radeonsi: Add buffer for offchip storage between TCS and TES.
radeonsi: Add offchip tessellation parameters.
radeonsi: Define build_tbuffer_store_dwords earlier to support new
users.
radeonsi: Add buffer load functions.
radeonsi: Use correct parameter index for LS_OUT_LAYOUT.
radeonsi: Add user SGPR for the layout of the offchip buffer.
radeonsi: Add offchip buffer address calculation.
radeonsi: Store inputs to memory when not using a TCS.
radeonsi: Use buffer loads and stores for passing data from TCS to
TES.
radeonsi: Remove LDS layout user SGPR's from TES.
radeonsi: Enable dynamic HS.
radeonsi: Use barrier instructions for TCS barriers.
radeonsi: Process multiple patches per threadgroup.
radeonsi: Allow TES distribution between shader engines.
src/gallium/drivers/radeonsi/si_pipe.c | 1 +
src/gallium/drivers/radeonsi/si_pipe.h | 1 +
src/gallium/drivers/radeonsi/si_shader.c | 567 ++++++++++++++++++------
src/gallium/drivers/radeonsi/si_shader.h | 32 +-
src/gallium/drivers/radeonsi/si_state.c | 5 +
src/gallium/drivers/radeonsi/si_state.h | 1 +
src/gallium/drivers/radeonsi/si_state_draw.c | 59 ++-
src/gallium/drivers/radeonsi/si_state_shaders.c | 67 ++-
8 files changed, 560 insertions(+), 173 deletions(-)
--
2.8.2
More information about the mesa-dev
mailing list