<div dir="ltr">One more quick note. If you find it nicer, the whole thing can be found here:<br><br><a href="http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1">http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1</a><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This series does a bunch of refactoring of the i965 fs backend IR to add<br>
concepts of register width and instruction execution size. There's more to<br>
be done yet, but this gets us most of the way there. It also removes the<br>
assumption that scalar values are always 1 register in SIMD8 and 2<br>
registers in SIMD16. In particular, we get the following:<br>
<br>
1) No more assumption about everything being 1 register. This allows us<br>
to allocate odd numbers of registers in SIMD16 which is needed for some<br>
payloads. Also, it should make implementing fp64 much easier because<br>
we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.<br>
There's a little more work to be don there, but this should take care<br>
of a lot of it.<br>
<br>
2) We can now do other instruction widths with relative ease. The<br>
compiler now detects, based on register widths, the execution size of<br>
the instruction and passes it down to the generator. One example of<br>
this is the patches in this series for UNTYPED_ATOMIC and<br>
UNTYPED_SURFACE_READ where part of setting up the payload is to do an<br>
8-wide move to fill a register with 0 and then a 1-wide move to set one<br>
particular component. We can now simply do this at the fs level and it<br>
will be get translated down to the correct assembly and properly<br>
handled by the compiler optimizations. There is more work to be done<br>
here at the generator level, but this series is already long enough<br>
<br>
3) Thanks to the above mentioned things, we can easily do send from GRF<br>
for FB writes. One of the major blockers here before was that the<br>
beginning of the FB write message was anywhere between 0 and 4<br>
registers regardless of whether you are in SIMD8 or SIMD16. Due to the<br>
implicit register doubling in SIMD16, it would have been a real pain to<br>
implement this properly. Now, it's trivial.<br>
<br>
I could go on about other changes, but those are the major ones.<br>
<br>
The requisite Shader DB results:<br>
<br>
total instructions in shared programs: 4999994 -> 4971746 (-0.56%)<br>
instructions in affected programs: 959392 -> 931144 (-2.94%)<br>
GAINED: 138<br>
LOST: 71<br>
<br>
There are some shaders that are hurt by 1 or 2 instructions. It could<br>
simply be send-from-grf, but prior to this last rebase, I don't remember<br>
there being any hurt programs. I'm going to look into it.<br>
<br>
Regarding Piglit:<br>
<br>
* On HSW, Every commit except for ones immediately followed by something<br>
labled SQUASH pass. (Except for glsl-routing and timestamp-get which<br>
are flaky).<br>
<br>
* On SNB and Gen4, the end of the series along with important<br>
intermediate points, such as changing GEN5 texturing or varying pull<br>
constant loads, pass.<br>
<br>
I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4<br>
code which I have since removed. I'll try to get that working and added<br>
back in later. That said, that's an optimization and not required, so we<br>
can leave it for now.<br>
<br>
Happy Reviewing!<br>
--Jason Ekstrand<br>
<br>
Jason Ekstrand (41):<br>
i965/brw_reg: Add a firsthalf function and use it in the generator<br>
i965/fs: A little harmless refactoring of register_coalesce<br>
i965/fs: Add a concept of a width to fs_reg<br>
i965/fs: Make half() divide the register width by 2 and use it more<br>
i965/fs: Handle printing of registers better.<br>
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode<br>
SQUASH: i965/fs: Use the register width when applying offsets<br>
SQUASH: i965/fs: Change regs_read to be in hardware registers<br>
SQUASH: i965/fs: Change regs_written to be actual hardware registers<br>
SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD<br>
SQUASH: i965/fs: Handle register widths in demote_pull_constants<br>
SQUASH: i965/fs: Get rid of implicit register doubling in the<br>
allocator<br>
SQUASH: i965/fs: Reserve enough registers for PLN instructions<br>
SQUASH: i965/fs: Make sources and destinations interfere in 16-wide<br>
SQUASH: i965/fs: Properly handle register widths in CSE<br>
SQUASH: i965/fs: Properly handle register widths in register_coalesce<br>
SQUASH: i965/fs: Properly handle widths in copy propagation<br>
SQUASH: i965/fs: Properly handle register widths in<br>
VARYING_PULL_CONSTANT_LOAD<br>
SQUASH: i965/fs: Properly handle register widths and odd register<br>
sizes in spilling<br>
SQUASH: i965/fs: Don't waste a register on texture lookups for gen >=<br>
7<br>
i965/fs: Rework GEN5 texturing code to use fs_reg and offset()<br>
i965/fs: Fix a bug in register coalesce<br>
i965/fs: Determine partial writes based on the destination width<br>
i965/fs: Add an exec_size field to fs_inst<br>
SQUASH: i965/fs: Explicitly set instruction execute size a couple of<br>
places<br>
SQUASH: i965/blorp: Explicitly set instruction execute sizes<br>
i965/fs: Better guess the width of LOAD_PAYLOAD<br>
i965/fs: Make fs_reg::effective_width take fs_inst* instead of<br>
fs_visitor*<br>
i965/fs: Derive force_uncompressed from instruction exec_size<br>
i965/fs: Remove unneeded uses of force_uncompressed<br>
i965/fs: Use instruction execution sizes to set compression state<br>
i965/fs: Use instruction execution sizes instead of heuristics<br>
i965/fs: Use exec_size instead of force_uncompressed in<br>
dump_instruction<br>
i965/fs: Use the instruction execution size directly for texture<br>
generation<br>
i966/fs: Add a function for getting a component of a 8 or 16-wide<br>
register<br>
i965/fs: Use the GRF for UNTYPED_ATOMIC instructions<br>
i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions<br>
i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE<br>
instruction<br>
i965/fs: Add split_virtual_grfs and compute_to_mrf after<br>
lower_load_payload<br>
i965/fs: Use the GRF for FB writes on gen >= 7<br>
SQUASH: i965/fs: Force a high register for the final FB write<br>
<br>
src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +-<br>
src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h | 36 +--<br>
src/mesa/drivers/dri/i965/brw_eu.h | 6 +-<br>
src/mesa/drivers/dri/i965/brw_eu_emit.c | 16 +-<br>
src/mesa/drivers/dri/i965/brw_fs.cpp | 355 +++++++++++++++-----<br>
src/mesa/drivers/dri/i965/brw_fs.h | 98 +++++-<br>
.../drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 +-<br>
src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 22 +-<br>
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 169 +++++-----<br>
.../drivers/dri/i965/brw_fs_live_variables.cpp | 10 +-<br>
src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 160 ++++++---<br>
.../drivers/dri/i965/brw_fs_register_coalesce.cpp | 50 ++-<br>
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 356 +++++++++++++--------<br>
src/mesa/drivers/dri/i965/brw_reg.h | 6 +<br>
.../drivers/dri/i965/brw_schedule_instructions.cpp | 15 +-<br>
src/mesa/drivers/dri/i965/brw_shader.cpp | 1 +<br>
src/mesa/drivers/dri/i965/intel_screen.h | 5 +<br>
17 files changed, 904 insertions(+), 417 deletions(-)<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
2.1.0<br>
<br>
</font></span></blockquote></div><br></div>