<div dir="ltr">One more quick note. If you find it nicer, the whole thing can be found here: <a href="http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1">http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1</a> </div><div class="gmail_extra"> <div class="gmail_quote">On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This series does a bunch of refactoring of the i965 fs backend IR to add concepts of register width and instruction execution size. There's more to be done yet, but this gets us most of the way there. It also removes the assumption that scalar values are always 1 register in SIMD8 and 2 registers in SIMD16. In particular, we get the following: 1) No more assumption about everything being 1 register. This allows us to allocate odd numbers of registers in SIMD16 which is needed for some payloads. Also, it should make implementing fp64 much easier because we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16. There's a little more work to be don there, but this should take care of a lot of it. 2) We can now do other instruction widths with relative ease. The compiler now detects, based on register widths, the execution size of the instruction and passes it down to the generator. One example of this is the patches in this series for UNTYPED_ATOMIC and UNTYPED_SURFACE_READ where part of setting up the payload is to do an 8-wide move to fill a register with 0 and then a 1-wide move to set one particular component. We can now simply do this at the fs level and it will be get translated down to the correct assembly and properly handled by the compiler optimizations. There is more work to be done here at the generator level, but this series is already long enough 3) Thanks to the above mentioned things, we can easily do send from GRF for FB writes. One of the major blockers here before was that the beginning of the FB write message was anywhere between 0 and 4 registers regardless of whether you are in SIMD8 or SIMD16. Due to the implicit register doubling in SIMD16, it would have been a real pain to implement this properly. Now, it's trivial. I could go on about other changes, but those are the major ones. The requisite Shader DB results: total instructions in shared programs: 4999994 -> 4971746 (-0.56%) instructions in affected programs: 959392 -> 931144 (-2.94%) GAINED: 138 LOST: 71 There are some shaders that are hurt by 1 or 2 instructions. It could simply be send-from-grf, but prior to this last rebase, I don't remember there being any hurt programs. I'm going to look into it. Regarding Piglit: * On HSW, Every commit except for ones immediately followed by something labled SQUASH pass. (Except for glsl-routing and timestamp-get which are flaky). * On SNB and Gen4, the end of the series along with important intermediate points, such as changing GEN5 texturing or varying pull constant loads, pass. I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4 code which I have since removed. I'll try to get that working and added back in later. That said, that's an optimization and not required, so we can leave it for now. Happy Reviewing! --Jason Ekstrand Jason Ekstrand (41): i965/brw_reg: Add a firsthalf function and use it in the generator i965/fs: A little harmless refactoring of register_coalesce i965/fs: Add a concept of a width to fs_reg i965/fs: Make half() divide the register width by 2 and use it more i965/fs: Handle printing of registers better. i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode SQUASH: i965/fs: Use the register width when applying offsets SQUASH: i965/fs: Change regs_read to be in hardware registers SQUASH: i965/fs: Change regs_written to be actual hardware registers SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD SQUASH: i965/fs: Handle register widths in demote_pull_constants SQUASH: i965/fs: Get rid of implicit register doubling in the allocator SQUASH: i965/fs: Reserve enough registers for PLN instructions SQUASH: i965/fs: Make sources and destinations interfere in 16-wide SQUASH: i965/fs: Properly handle register widths in CSE SQUASH: i965/fs: Properly handle register widths in register_coalesce SQUASH: i965/fs: Properly handle widths in copy propagation SQUASH: i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD SQUASH: i965/fs: Properly handle register widths and odd register sizes in spilling SQUASH: i965/fs: Don't waste a register on texture lookups for gen >= 7 i965/fs: Rework GEN5 texturing code to use fs_reg and offset() i965/fs: Fix a bug in register coalesce i965/fs: Determine partial writes based on the destination width i965/fs: Add an exec_size field to fs_inst SQUASH: i965/fs: Explicitly set instruction execute size a couple of places SQUASH: i965/blorp: Explicitly set instruction execute sizes i965/fs: Better guess the width of LOAD_PAYLOAD i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor* i965/fs: Derive force_uncompressed from instruction exec_size i965/fs: Remove unneeded uses of force_uncompressed i965/fs: Use instruction execution sizes to set compression state i965/fs: Use instruction execution sizes instead of heuristics i965/fs: Use exec_size instead of force_uncompressed in dump_instruction i965/fs: Use the instruction execution size directly for texture generation i966/fs: Add a function for getting a component of a 8 or 16-wide register i965/fs: Use the GRF for UNTYPED_ATOMIC instructions i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instruction i965/fs: Add split_virtual_grfs and compute_to_mrf after lower_load_payload i965/fs: Use the GRF for FB writes on gen >= 7 SQUASH: i965/fs: Force a high register for the final FB write src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h | 36 +-- src/mesa/drivers/dri/i965/brw_eu.h | 6 +- src/mesa/drivers/dri/i965/brw_eu_emit.c | 16 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 355 +++++++++++++++----- src/mesa/drivers/dri/i965/brw_fs.h | 98 +++++- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 22 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 169 +++++----- .../drivers/dri/i965/brw_fs_live_variables.cpp | 10 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 160 ++++++--- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 50 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 356 +++++++++++++-------- src/mesa/drivers/dri/i965/brw_reg.h | 6 + .../drivers/dri/i965/brw_schedule_instructions.cpp | 15 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 1 + src/mesa/drivers/dri/i965/intel_screen.h | 5 + 17 files changed, 904 insertions(+), 417 deletions(-) -- 2.1.0 </blockquote></div> </div>