<div dir="auto"><div><div class="gmail_extra"><div class="gmail_quote">On Dec 5, 2016 12:14 PM, "Connor Abbott" <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm a little worried about this since it seems like the<br>
load/store_scratch intrinsics are basically doing the same thing as<br>
registers were originally intended to do. Either we should use the<br>
existing register lowering, and make it conditional on the size like<br>
you've done here, or we should just gut larger-than-vec4 registers<br>
entirely and go with this instead. TBH, I'm kinda leaning towards the<br>
latter, since I know Rob has expressed some interest in using<br>
something like this instead of registers, and it seems like nobody<br>
really wants the ability to indirectly address stuff inside, say, an<br>
add instruction anyways.</blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">The vec4 backend does use indirects on registers today. Another option in this series would be to put the heuristic in lower_indirect_derefs instead and let the larger indirected things turn into registers and do it that way. But I do like having separate instructions rather than those weird indirect sources.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="elided-text">
On Mon, Dec 5, 2016 at 2:59 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
> This little series implements lowering of indirectly accessed local<br>
> variables larger than some threshold (8 floats?) to scratch space. This<br>
> improves the performance of the CSDof synmark test by about 45% because it<br>
> uses a large temporary array which we lower to if-ladders and then to piles<br>
> of scratch.<br>
><br>
> The approach I've taken here is to add a new set of NIR intrinsics for<br>
> reading and writing scratch. It's treated like any other form of IO with a<br>
> new nir_lower_vars_to_scratch pass that lowers everything over a given size<br>
> threshold to scratch space. Why do this in NIR? The primary reason is<br>
> that this lets us lower to scratch *before* we do nir_lower_indirect_derefs<br>
> so we can still use registers for small indirects where an if-ladder is<br>
> more efficient than scratch space. Also, after gaving it a try, I really<br>
> liked how those intrinsics turned out.<br>
><br>
> This series is marked RFC because it's still a bit sketchy at the moment.<br>
> There are a few things that would need to be finished before it's ready for<br>
> landing:<br>
><br>
> 1) I should probably run it through piglit.<br>
> 2) The back-end portion doesn't yet handle doubles<br>
> 3) We should use send-from-GRF for non-spill direct scratch reads/writes.<br>
> Right now, it's still using MRFs which isn't great.<br>
><br>
> If people like where this series is going, I can probably find some time to<br>
> polish it to the point of mergeable.<br>
><br>
> Jason Ekstrand (6):<br>
> nir: Add load/store_scratch intrinsics<br>
> nir: Add a pass for selectively lowering variables to scratch space<br>
> i965/fs: Add a CHANNEL_IDS opcode<br>
> i965/fs: Add DWord scattered read/write opcodes<br>
> i965/fs: Implement the new nir_scratch_load/store opcodes<br>
> i965: Lower large local arrays to scratch<br>
><br>
> Timothy Arceri (1):<br>
> i965: use nir_lower_indirect_derefs() for GLSL<br>
><br>
> src/compiler/Makefile.sources | 1 +<br>
> src/compiler/nir/nir.h | 8 +-<br>
> src/compiler/nir/nir_clone.c | 1 +<br>
> src/compiler/nir/nir_<wbr>intrinsics.h | 6 +-<br>
> src/compiler/nir/nir_lower_<wbr>scratch.c | 258 ++++++++++++++++++++++<br>
> src/intel/vulkan/anv_pipeline.<wbr>c | 10 -<br>
> src/mesa/drivers/dri/i965/brw_<wbr>defines.h | 10 +<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs.cpp | 113 ++++++++++<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs.h | 6 +<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs_cse.cpp | 1 +<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs_generator.cpp | 170 ++++++++++++++<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs_nir.cpp | 42 +++-<br>
> src/mesa/drivers/dri/i965/brw_<wbr>fs_reg_allocate.cpp | 4 +-<br>
> src/mesa/drivers/dri/i965/brw_<wbr>link.cpp | 13 --<br>
> src/mesa/drivers/dri/i965/brw_<wbr>nir.c | 13 ++<br>
> src/mesa/drivers/dri/i965/brw_<wbr>shader.cpp | 12 +<br>
> 16 files changed, 631 insertions(+), 37 deletions(-)<br>
> create mode 100644 src/compiler/nir/nir_lower_<wbr>scratch.c<br>
><br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</div>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div><br></div></div></div>