<div dir="auto"><div><div class="gmail_extra"><div class="gmail_quote">On Dec 5, 2016 12:14 PM, "Connor Abbott" <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm a little worried about this since it seems like the<br> load/store_scratch intrinsics are basically doing the same thing as<br> registers were originally intended to do. Either we should use the<br> existing register lowering, and make it conditional on the size like<br> you've done here, or we should just gut larger-than-vec4 registers<br> entirely and go with this instead. TBH, I'm kinda leaning towards the<br> latter, since I know Rob has expressed some interest in using<br> something like this instead of registers, and it seems like nobody<br> really wants the ability to indirectly address stuff inside, say, an<br> add instruction anyways.</blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">The vec4 backend does use indirects on registers today. Another option in this series would be to put the heuristic in lower_indirect_derefs instead and let the larger indirected things turn into registers and do it that way. But I do like having separate instructions rather than those weird indirect sources.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="elided-text"> On Mon, Dec 5, 2016 at 2:59 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br> > This little series implements lowering of indirectly accessed local<br> > variables larger than some threshold (8 floats?) to scratch space. This<br> > improves the performance of the CSDof synmark test by about 45% because it<br> > uses a large temporary array which we lower to if-ladders and then to piles<br> > of scratch.<br> ><br> > The approach I've taken here is to add a new set of NIR intrinsics for<br> > reading and writing scratch. It's treated like any other form of IO with a<br> > new nir_lower_vars_to_scratch pass that lowers everything over a given size<br> > threshold to scratch space. Why do this in NIR? The primary reason is<br> > that this lets us lower to scratch *before* we do nir_lower_indirect_derefs<br> > so we can still use registers for small indirects where an if-ladder is<br> > more efficient than scratch space. Also, after gaving it a try, I really<br> > liked how those intrinsics turned out.<br> ><br> > This series is marked RFC because it's still a bit sketchy at the moment.<br> > There are a few things that would need to be finished before it's ready for<br> > landing:<br> ><br> > 1) I should probably run it through piglit.<br> > 2) The back-end portion doesn't yet handle doubles<br> > 3) We should use send-from-GRF for non-spill direct scratch reads/writes.<br> > Right now, it's still using MRFs which isn't great.<br> ><br> > If people like where this series is going, I can probably find some time to<br> > polish it to the point of mergeable.<br> ><br> > Jason Ekstrand (6):<br> > nir: Add load/store_scratch intrinsics<br> > nir: Add a pass for selectively lowering variables to scratch space<br> > i965/fs: Add a CHANNEL_IDS opcode<br> > i965/fs: Add DWord scattered read/write opcodes<br> > i965/fs: Implement the new nir_scratch_load/store opcodes<br> > i965: Lower large local arrays to scratch<br> ><br> > Timothy Arceri (1):<br> > i965: use nir_lower_indirect_derefs() for GLSL<br> ><br> > src/compiler/Makefile.sources | 1 +<br> > src/compiler/nir/nir.h | 8 +-<br> > src/compiler/nir/nir_clone.c | 1 +<br> > src/compiler/nir/nir_<wbr>intrinsics.h | 6 +-<br> > src/compiler/nir/nir_lower_<wbr>scratch.c | 258 ++++++++++++++++++++++<br> > src/intel/vulkan/anv_pipeline.<wbr>c | 10 -<br> > src/mesa/drivers/dri/i965/brw_<wbr>defines.h | 10 +<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs.cpp | 113 ++++++++++<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs.h | 6 +<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs_cse.cpp | 1 +<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs_generator.cpp | 170 ++++++++++++++<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs_nir.cpp | 42 +++-<br> > src/mesa/drivers/dri/i965/brw_<wbr>fs_reg_allocate.cpp | 4 +-<br> > src/mesa/drivers/dri/i965/brw_<wbr>link.cpp | 13 --<br> > src/mesa/drivers/dri/i965/brw_<wbr>nir.c | 13 ++<br> > src/mesa/drivers/dri/i965/brw_<wbr>shader.cpp | 12 +<br> > 16 files changed, 631 insertions(+), 37 deletions(-)<br> > create mode 100644 src/compiler/nir/nir_lower_<wbr>scratch.c<br> ><br> > --<br> > 2.5.0.400.gff86faf<br> ><br> </div>> ______________________________<wbr>_________________<br> > mesa-dev mailing list<br> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br> </blockquote></div><br></div></div></div>