<div dir="ltr"><div><div><div>Hi,<br><br></div>nice! The windows driver has an option to limit tessellation factor to x16 etc, would it be possible to implement something similar in radeonsi? Of course that's kindof to opposite to actually making x32 and x64 faster like you're doing here. :-)<br><br></div>Regards<br></div>//Ernst<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-05-10 12:52 GMT+02:00 Bas Nieuwenhuizen <span dir="ltr"><<a href="mailto:bas@basnieuwenhuizen.nl" target="_blank">bas@basnieuwenhuizen.nl</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This patchset implements offchip tessellation after which we can finally process<br>
more than one patch per wave without decreasing tessmark scores.<br>
<br>
For tessmark this improves performance by ~20% for the x32 case and ~80% for the<br>
x64 case. x8 and x16 have roughly the same performance as before. Unigine heaven<br>
gets 43 fps compared to 28 before (roughly +50%). Amdgpu-pro gets 44 fps for<br>
heaven. For Shadow of Mordor the performance changes from 28 fps to 40 fps<br>
(roughly +40%).<br>
<br>
Remaining ideas for improvement are:<br>
<br>
- Don't store TCS outputs to TCS and don't unnecessarily allocate LDS. This<br>
has pretty much no measurable effect in the games I tried.<br>
<br>
- Only store TCS outputs to memory when the tess factors exceed a threshold. I<br>
haven't been able to get the LDS case working with dynamic HS enabled, but<br>
the decompiled amdgpu-pro shaders give a very strong hint that this is<br>
possible. However amdgpu-pro sets the thresshold to -1, so pretty much always<br>
stores to memory too as far as I can see. Maybe it does not work on VI,<br>
or there is some interaction with the VI only distribution modes and these<br>
were considered more profitable.<br>
<br>
- Hardware swizzled buffers. The swizzling by hand I use results in extra VALU<br>
instructions and it would be nice if we did not need to have them. However,<br>
my attempts have not resulted in a performance improvement yet.<br>
<br>
I have run the piglit gpu suite and found no regressions on a Tonga card.<br>
<br>
Bas Nieuwenhuizen (14):<br>
radeonsi: Add buffer for offchip storage between TCS and TES.<br>
radeonsi: Add offchip tessellation parameters.<br>
radeonsi: Define build_tbuffer_store_dwords earlier to support new<br>
users.<br>
radeonsi: Add buffer load functions.<br>
radeonsi: Use correct parameter index for LS_OUT_LAYOUT.<br>
radeonsi: Add user SGPR for the layout of the offchip buffer.<br>
radeonsi: Add offchip buffer address calculation.<br>
radeonsi: Store inputs to memory when not using a TCS.<br>
radeonsi: Use buffer loads and stores for passing data from TCS to<br>
TES.<br>
radeonsi: Remove LDS layout user SGPR's from TES.<br>
radeonsi: Enable dynamic HS.<br>
radeonsi: Use barrier instructions for TCS barriers.<br>
radeonsi: Process multiple patches per threadgroup.<br>
radeonsi: Allow TES distribution between shader engines.<br>
<br>
src/gallium/drivers/radeonsi/si_pipe.c | 1 +<br>
src/gallium/drivers/radeonsi/si_pipe.h | 1 +<br>
src/gallium/drivers/radeonsi/si_shader.c | 567 ++++++++++++++++++------<br>
src/gallium/drivers/radeonsi/si_shader.h | 32 +-<br>
src/gallium/drivers/radeonsi/si_state.c | 5 +<br>
src/gallium/drivers/radeonsi/si_state.h | 1 +<br>
src/gallium/drivers/radeonsi/si_state_draw.c | 59 ++-<br>
src/gallium/drivers/radeonsi/si_state_shaders.c | 67 ++-<br>
8 files changed, 560 insertions(+), 173 deletions(-)<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
2.8.2<br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div>