[Mesa-dev] [PATCH 00/41] Welcome back Matt!
Jason Ekstrand
jason at jlekstrand.net
Sat Sep 20 10:22:49 PDT 2014
This series does a bunch of refactoring of the i965 fs backend IR to add
concepts of register width and instruction execution size. There's more to
be done yet, but this gets us most of the way there. It also removes the
assumption that scalar values are always 1 register in SIMD8 and 2
registers in SIMD16. In particular, we get the following:
1) No more assumption about everything being 1 register. This allows us
to allocate odd numbers of registers in SIMD16 which is needed for some
payloads. Also, it should make implementing fp64 much easier because
we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.
There's a little more work to be don there, but this should take care
of a lot of it.
2) We can now do other instruction widths with relative ease. The
compiler now detects, based on register widths, the execution size of
the instruction and passes it down to the generator. One example of
this is the patches in this series for UNTYPED_ATOMIC and
UNTYPED_SURFACE_READ where part of setting up the payload is to do an
8-wide move to fill a register with 0 and then a 1-wide move to set one
particular component. We can now simply do this at the fs level and it
will be get translated down to the correct assembly and properly
handled by the compiler optimizations. There is more work to be done
here at the generator level, but this series is already long enough
3) Thanks to the above mentioned things, we can easily do send from GRF
for FB writes. One of the major blockers here before was that the
beginning of the FB write message was anywhere between 0 and 4
registers regardless of whether you are in SIMD8 or SIMD16. Due to the
implicit register doubling in SIMD16, it would have been a real pain to
implement this properly. Now, it's trivial.
I could go on about other changes, but those are the major ones.
The requisite Shader DB results:
total instructions in shared programs: 4999994 -> 4971746 (-0.56%)
instructions in affected programs: 959392 -> 931144 (-2.94%)
GAINED: 138
LOST: 71
There are some shaders that are hurt by 1 or 2 instructions. It could
simply be send-from-grf, but prior to this last rebase, I don't remember
there being any hurt programs. I'm going to look into it.
Regarding Piglit:
* On HSW, Every commit except for ones immediately followed by something
labled SQUASH pass. (Except for glsl-routing and timestamp-get which
are flaky).
* On SNB and Gen4, the end of the series along with important
intermediate points, such as changing GEN5 texturing or varying pull
constant loads, pass.
I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4
code which I have since removed. I'll try to get that working and added
back in later. That said, that's an optimization and not required, so we
can leave it for now.
Happy Reviewing!
--Jason Ekstrand
Jason Ekstrand (41):
i965/brw_reg: Add a firsthalf function and use it in the generator
i965/fs: A little harmless refactoring of register_coalesce
i965/fs: Add a concept of a width to fs_reg
i965/fs: Make half() divide the register width by 2 and use it more
i965/fs: Handle printing of registers better.
i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
SQUASH: i965/fs: Use the register width when applying offsets
SQUASH: i965/fs: Change regs_read to be in hardware registers
SQUASH: i965/fs: Change regs_written to be actual hardware registers
SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD
SQUASH: i965/fs: Handle register widths in demote_pull_constants
SQUASH: i965/fs: Get rid of implicit register doubling in the
allocator
SQUASH: i965/fs: Reserve enough registers for PLN instructions
SQUASH: i965/fs: Make sources and destinations interfere in 16-wide
SQUASH: i965/fs: Properly handle register widths in CSE
SQUASH: i965/fs: Properly handle register widths in register_coalesce
SQUASH: i965/fs: Properly handle widths in copy propagation
SQUASH: i965/fs: Properly handle register widths in
VARYING_PULL_CONSTANT_LOAD
SQUASH: i965/fs: Properly handle register widths and odd register
sizes in spilling
SQUASH: i965/fs: Don't waste a register on texture lookups for gen >=
7
i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
i965/fs: Fix a bug in register coalesce
i965/fs: Determine partial writes based on the destination width
i965/fs: Add an exec_size field to fs_inst
SQUASH: i965/fs: Explicitly set instruction execute size a couple of
places
SQUASH: i965/blorp: Explicitly set instruction execute sizes
i965/fs: Better guess the width of LOAD_PAYLOAD
i965/fs: Make fs_reg::effective_width take fs_inst* instead of
fs_visitor*
i965/fs: Derive force_uncompressed from instruction exec_size
i965/fs: Remove unneeded uses of force_uncompressed
i965/fs: Use instruction execution sizes to set compression state
i965/fs: Use instruction execution sizes instead of heuristics
i965/fs: Use exec_size instead of force_uncompressed in
dump_instruction
i965/fs: Use the instruction execution size directly for texture
generation
i966/fs: Add a function for getting a component of a 8 or 16-wide
register
i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions
i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE
instruction
i965/fs: Add split_virtual_grfs and compute_to_mrf after
lower_load_payload
i965/fs: Use the GRF for FB writes on gen >= 7
SQUASH: i965/fs: Force a high register for the final FB write
src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +-
src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h | 36 +--
src/mesa/drivers/dri/i965/brw_eu.h | 6 +-
src/mesa/drivers/dri/i965/brw_eu_emit.c | 16 +-
src/mesa/drivers/dri/i965/brw_fs.cpp | 355 +++++++++++++++-----
src/mesa/drivers/dri/i965/brw_fs.h | 98 +++++-
.../drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 +-
src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 22 +-
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 169 +++++-----
.../drivers/dri/i965/brw_fs_live_variables.cpp | 10 +-
src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 160 ++++++---
.../drivers/dri/i965/brw_fs_register_coalesce.cpp | 50 ++-
src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 356 +++++++++++++--------
src/mesa/drivers/dri/i965/brw_reg.h | 6 +
.../drivers/dri/i965/brw_schedule_instructions.cpp | 15 +-
src/mesa/drivers/dri/i965/brw_shader.cpp | 1 +
src/mesa/drivers/dri/i965/intel_screen.h | 5 +
17 files changed, 904 insertions(+), 417 deletions(-)
--
2.1.0
More information about the mesa-dev
mailing list