[Mesa-dev] [PATCH 00/41] Welcome back Matt!

Sat Sep 20 10:22:49 PDT 2014

This series does a bunch of refactoring of the i965 fs backend IR to add
concepts of register width and instruction execution size.  There's more to
be done yet, but this gets us most of the way there.  It also removes the
assumption that scalar values are always 1 register in SIMD8 and 2
registers in SIMD16.  In particular, we get the following:

 1) No more assumption about everything being 1 register.  This allows us
    to allocate odd numbers of registers in SIMD16 which is needed for some
    payloads.  Also, it should make implementing fp64 much easier because
    we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.
    There's a little more work to be don there, but this should take care
    of a lot of it.

 2) We can now do other instruction widths with relative ease.  The
    compiler now detects, based on register widths, the execution size of
    the instruction and passes it down to the generator.  One example of
    this is the patches in this series for UNTYPED_ATOMIC and
    UNTYPED_SURFACE_READ where part of setting up the payload is to do an
    8-wide move to fill a register with 0 and then a 1-wide move to set one
    particular component.  We can now simply do this at the fs level and it
    will be get translated down to the correct assembly and properly
    handled by the compiler optimizations.  There is more work to be done
    here at the generator level, but this series is already long enough

 3) Thanks to the above mentioned things, we can easily do send from GRF
    for FB writes.  One of the major blockers here before was that the
    beginning of the FB write message was anywhere between 0 and 4
    registers regardless of whether you are in SIMD8 or SIMD16.  Due to the
    implicit register doubling in SIMD16, it would have been a real pain to
    implement this properly.  Now, it's trivial.

I could go on about other changes, but those are the major ones.

The requisite Shader DB results:

total instructions in shared programs: 4999994 -> 4971746 (-0.56%)
instructions in affected programs:     959392 -> 931144 (-2.94%)
GAINED:                                138
LOST:                                  71

There are some shaders that are hurt by 1 or 2 instructions.  It could
simply be send-from-grf, but prior to this last rebase, I don't remember
there being any hurt programs.  I'm going to look into it.

Regarding Piglit:

 * On HSW,  Every commit except for ones immediately followed by something
   labled SQUASH pass.  (Except for glsl-routing and timestamp-get which
   are flaky).

 * On SNB and Gen4, the end of the series along with important
   intermediate points, such as changing GEN5 texturing or varying pull
   constant loads, pass.  

I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4
code which I have since removed.  I'll try to get that working and added
back in later.  That said, that's an optimization and not required, so we
can leave it for now.

Happy Reviewing!
--Jason Ekstrand

Jason Ekstrand (41):
  i965/brw_reg: Add a firsthalf function and use it in the generator
  i965/fs: A little harmless refactoring of register_coalesce
  i965/fs: Add a concept of a width to fs_reg
  i965/fs: Make half() divide the register width by 2 and use it more
  i965/fs: Handle printing of registers better.
  i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
  SQUASH: i965/fs: Use the register width when applying offsets
  SQUASH: i965/fs: Change regs_read to be in hardware registers
  SQUASH: i965/fs: Change regs_written to be actual hardware registers
  SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD
  SQUASH: i965/fs: Handle register widths in demote_pull_constants
  SQUASH: i965/fs: Get rid of implicit register doubling in the
    allocator
  SQUASH: i965/fs: Reserve enough registers for PLN instructions
  SQUASH: i965/fs: Make sources and destinations interfere in 16-wide
  SQUASH: i965/fs: Properly handle register widths in CSE
  SQUASH: i965/fs: Properly handle register widths in register_coalesce
  SQUASH: i965/fs: Properly handle widths in copy propagation
  SQUASH: i965/fs: Properly handle register widths in
    VARYING_PULL_CONSTANT_LOAD
  SQUASH: i965/fs: Properly handle register widths and odd register
    sizes in spilling
  SQUASH: i965/fs: Don't waste a register on texture lookups for gen >=
    7
  i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
  i965/fs: Fix a bug in register coalesce
  i965/fs: Determine partial writes based on the destination width
  i965/fs: Add an exec_size field to fs_inst
  SQUASH: i965/fs: Explicitly set instruction execute size a couple of
    places
  SQUASH: i965/blorp: Explicitly set instruction execute sizes
  i965/fs: Better guess the width of LOAD_PAYLOAD
  i965/fs: Make fs_reg::effective_width take fs_inst* instead of
    fs_visitor*
  i965/fs: Derive force_uncompressed from instruction exec_size
  i965/fs: Remove unneeded uses of force_uncompressed
  i965/fs: Use instruction execution sizes to set compression state
  i965/fs: Use instruction execution sizes instead of heuristics
  i965/fs: Use exec_size instead of force_uncompressed in
    dump_instruction
  i965/fs: Use the instruction execution size directly for texture
    generation
  i966/fs: Add a function for getting a component of a 8 or 16-wide
    register
  i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
  i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions
  i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE
    instruction
  i965/fs: Add split_virtual_grfs and compute_to_mrf after
    lower_load_payload
  i965/fs: Use the GRF for FB writes on gen >= 7
  SQUASH: i965/fs: Force a high register for the final FB write

 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp    |   2 +-
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h      |  36 +--
 src/mesa/drivers/dri/i965/brw_eu.h                 |   6 +-
 src/mesa/drivers/dri/i965/brw_eu_emit.c            |  16 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp               | 355 +++++++++++++++-----
 src/mesa/drivers/dri/i965/brw_fs.h                 |  98 +++++-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  14 +-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp           |  22 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp     | 169 +++++-----
 .../drivers/dri/i965/brw_fs_live_variables.cpp     |  10 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 160 ++++++---
 .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |  50 ++-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp       | 356 +++++++++++++--------
 src/mesa/drivers/dri/i965/brw_reg.h                |   6 +
 .../drivers/dri/i965/brw_schedule_instructions.cpp |  15 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp           |   1 +
 src/mesa/drivers/dri/i965/intel_screen.h           |   5 +
 17 files changed, 904 insertions(+), 417 deletions(-)

-- 
2.1.0