<div dir="auto"><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Jun 4, 2017 4:04 AM, "Dieter Nützel" <<a href="mailto:Dieter@nuetzel-hh.de">Dieter@nuetzel-hh.de</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It improves Uningine Heaven performance here (RX 580, 8 GB),too.<br>
<br>
without Tessellation<br>
<br>
Before:<br>
FPS: 79.0<br>
Score: 1991<br>
Min FPS: 20.8<br>
Max FPS: 189.8<br>
<br>
After:<br>
FPS: 79.1<br>
Score: 1993<br>
Min FPS: 19.6<br>
Max FPS: 185.9<br>
<br>
<br>
with Tessellation<br>
<br>
Before:<br>
FPS: 67.7<br>
Score: 1705<br>
Min FPS: 8.8<br>
Max FPS: 182.6<br>
<br>
After:<br>
FPS: 68.3<br>
Score: 1720<br>
Min FPS: 15.9<br>
Max FPS: 179.6<br>
<br>
<br>
System<br>
<br>
Platform: Linux 4.20.0-amd-staging-4.11-1.g726<wbr>2353-default+ x86_64<br>
CPU model: Intel(R) Xeon(R) CPU X3470 @ 2.93GHz (2925MHz) x8<br>
GPU model: Unknown GPU (256MB) x1<br>
<br>
Marek is this the 'tessellation regression' you tried to solve?<br></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">No, the SI tessellation regression needs a different fix.</div><div dir="auto"><br></div><div dir="auto">Marek</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
For this series:<br>
Tested-by: Dieter Nützel <<a href="mailto:Dieter@nuetzel-hh.de" target="_blank">Dieter@nuetzel-hh.de</a>><font color="#888888"><br>
<br>
Dieter</font><div class="elided-text"><br>
<br>
Am 03.06.2017 18:04, schrieb Marek Olšák:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>><br>
<br>
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"<br>
for LS and HS. Note that 32K is the maximum size.<br>
<br>
Before:<br>
heaven_x64: ls=1f1 tcs=1f1, lds=32K<br>
heaven_x64: ls=31 tcs=31, lds=24K<br>
heaven_x64: ls=71 tcs=71, lds=28K<br>
<br>
After:<br>
heaven_x64: ls=3f tcs=3f, lds=24K<br>
heaven_x64: ls=7 tcs=7, lds=13K<br>
heaven_x64: ls=f tcs=f, lds=17K<br>
<br>
All other apps have a similar decrease in LDS usage, because<br>
the "outputs_written" masks are similar. Also, most apps don't write<br>
POSITION in these shader stages, so there is room for improvement.<br>
(tight per-component input/output packing might help even more)<br>
<br>
It's unknown whether this improves performance.<br>
---<br>
src/gallium/drivers/radeonsi/<wbr>si_shader.c | 18 +++++++++++-------<br>
src/gallium/drivers/radeonsi/<wbr>si_state_shaders.c | 4 +++-<br>
2 files changed, 14 insertions(+), 8 deletions(-)<br>
<br>
diff --git a/src/gallium/drivers/radeonsi<wbr>/si_shader.c<br>
b/src/gallium/drivers/radeonsi<wbr>/si_shader.c<br>
index ddfaa3b..3a86c0b 100644<br>
--- a/src/gallium/drivers/radeonsi<wbr>/si_shader.c<br>
+++ b/src/gallium/drivers/radeonsi<wbr>/si_shader.c<br>
@@ -129,32 +129,36 @@ unsigned<br>
si_shader_io_get_unique_index_<wbr>patch(unsigned semantic_name, unsigned<br>
in<br>
/**<br>
* Returns a unique index for a semantic name and index. The index must be<br>
* less than 64, so that a 64-bit bitmask of used inputs or outputs can be<br>
* calculated.<br>
*/<br>
unsigned si_shader_io_get_unique_index(<wbr>unsigned semantic_name, unsigned index)<br>
{<br>
switch (semantic_name) {<br>
case TGSI_SEMANTIC_POSITION:<br>
return 0;<br>
- case TGSI_SEMANTIC_PSIZE:<br>
- return 1;<br>
- case TGSI_SEMANTIC_CLIPDIST:<br>
- assert(index <= 1);<br>
- return 2 + index;<br>
case TGSI_SEMANTIC_GENERIC:<br>
+ /* Since some shader stages use the the highest used IO index<br>
+ * to determine the size to allocate for inputs/outputs<br>
+ * (in LDS, tess and GS rings). GENERIC should be placed right<br>
+ * after POSITION to make that size as small as possible.<br>
+ */<br>
if (index < SI_MAX_IO_GENERIC)<br>
- return 4 + index;<br>
+ return 1 + index;<br>
<br>
assert(!"invalid generic index");<br>
return 0;<br>
-<br>
+ case TGSI_SEMANTIC_PSIZE:<br>
+ return SI_MAX_IO_GENERIC + 1;<br>
+ case TGSI_SEMANTIC_CLIPDIST:<br>
+ assert(index <= 1);<br>
+ return SI_MAX_IO_GENERIC + 2 + index;<br>
case TGSI_SEMANTIC_FOG:<br>
return SI_MAX_IO_GENERIC + 4;<br>
case TGSI_SEMANTIC_LAYER:<br>
return SI_MAX_IO_GENERIC + 5;<br>
case TGSI_SEMANTIC_VIEWPORT_INDEX:<br>
return SI_MAX_IO_GENERIC + 6;<br>
case TGSI_SEMANTIC_PRIMID:<br>
return SI_MAX_IO_GENERIC + 7;<br>
case TGSI_SEMANTIC_COLOR: /* these alias */<br>
case TGSI_SEMANTIC_BCOLOR:<br>
diff --git a/src/gallium/drivers/radeonsi<wbr>/si_state_shaders.c<br>
b/src/gallium/drivers/radeonsi<wbr>/si_state_shaders.c<br>
index 8ac4309..f36997b 100644<br>
--- a/src/gallium/drivers/radeonsi<wbr>/si_state_shaders.c<br>
+++ b/src/gallium/drivers/radeonsi<wbr>/si_state_shaders.c<br>
@@ -1226,21 +1226,23 @@ static void<br>
si_shader_selector_key_hw_vs(s<wbr>truct si_context *sctx,<br>
ps_disabled = sctx->queued.named.rasterizer-<wbr>>rasterizer_discard ||<br>
(!ps_colormask &&<br>
!ps_modifies_zs &&<br>
!ps->info.writes_memory);<br>
}<br>
<br>
/* Find out which VS outputs aren't used by the PS. */<br>
uint64_t outputs_written = vs->outputs_written;<br>
uint64_t inputs_read = 0;<br>
<br>
- outputs_written &= ~0x3; /* ignore POSITION, PSIZE */<br>
+ /* ignore POSITION, PSIZE */<br>
+ outputs_written &= ~((1ull <<<br>
si_shader_io_get_unique_index(<wbr>TGSI_SEMANTIC_POSITION, 0) |<br>
+ (1ull << si_shader_io_get_unique_index(<wbr>TGSI_SEMANTIC_PSIZE, 0))));<br>
<br>
if (!ps_disabled) {<br>
inputs_read = ps->inputs_read;<br>
}<br>
<br>
uint64_t linked = outputs_written & inputs_read;<br>
<br>
key->opt.hw_vs.kill_outputs = ~linked & outputs_written;<br>
}<br>
</blockquote>
</div></blockquote></div><br></div></div></div>