[Mesa-dev] [PATCH 00/41] Welcome back Matt!

Sat Sep 20 10:25:55 PDT 2014

One more quick note.  If you find it nicer, the whole thing can be found
here:

http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1

On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

> This series does a bunch of refactoring of the i965 fs backend IR to add
> concepts of register width and instruction execution size.  There's more to
> be done yet, but this gets us most of the way there.  It also removes the
> assumption that scalar values are always 1 register in SIMD8 and 2
> registers in SIMD16.  In particular, we get the following:
>
>  1) No more assumption about everything being 1 register.  This allows us
>     to allocate odd numbers of registers in SIMD16 which is needed for some
>     payloads.  Also, it should make implementing fp64 much easier because
>     we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.
>     There's a little more work to be don there, but this should take care
>     of a lot of it.
>
>  2) We can now do other instruction widths with relative ease.  The
>     compiler now detects, based on register widths, the execution size of
>     the instruction and passes it down to the generator.  One example of
>     this is the patches in this series for UNTYPED_ATOMIC and
>     UNTYPED_SURFACE_READ where part of setting up the payload is to do an
>     8-wide move to fill a register with 0 and then a 1-wide move to set one
>     particular component.  We can now simply do this at the fs level and it
>     will be get translated down to the correct assembly and properly
>     handled by the compiler optimizations.  There is more work to be done
>     here at the generator level, but this series is already long enough
>
>  3) Thanks to the above mentioned things, we can easily do send from GRF
>     for FB writes.  One of the major blockers here before was that the
>     beginning of the FB write message was anywhere between 0 and 4
>     registers regardless of whether you are in SIMD8 or SIMD16.  Due to the
>     implicit register doubling in SIMD16, it would have been a real pain to
>     implement this properly.  Now, it's trivial.
>
> I could go on about other changes, but those are the major ones.
>
> The requisite Shader DB results:
>
> total instructions in shared programs: 4999994 -> 4971746 (-0.56%)
> instructions in affected programs:     959392 -> 931144 (-2.94%)
> GAINED:                                138
> LOST:                                  71
>
> There are some shaders that are hurt by 1 or 2 instructions.  It could
> simply be send-from-grf, but prior to this last rebase, I don't remember
> there being any hurt programs.  I'm going to look into it.
>
> Regarding Piglit:
>
>  * On HSW,  Every commit except for ones immediately followed by something
>    labled SQUASH pass.  (Except for glsl-routing and timestamp-get which
>    are flaky).
>
>  * On SNB and Gen4, the end of the series along with important
>    intermediate points, such as changing GEN5 texturing or varying pull
>    constant loads, pass.
>
> I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4
> code which I have since removed.  I'll try to get that working and added
> back in later.  That said, that's an optimization and not required, so we
> can leave it for now.
>
> Happy Reviewing!
> --Jason Ekstrand
>
> Jason Ekstrand (41):
>   i965/brw_reg: Add a firsthalf function and use it in the generator
>   i965/fs: A little harmless refactoring of register_coalesce
>   i965/fs: Add a concept of a width to fs_reg
>   i965/fs: Make half() divide the register width by 2 and use it more
>   i965/fs: Handle printing of registers better.
>   i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
>   SQUASH: i965/fs: Use the register width when applying offsets
>   SQUASH: i965/fs: Change regs_read to be in hardware registers
>   SQUASH: i965/fs: Change regs_written to be actual hardware registers
>   SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD
>   SQUASH: i965/fs: Handle register widths in demote_pull_constants
>   SQUASH: i965/fs: Get rid of implicit register doubling in the
>     allocator
>   SQUASH: i965/fs: Reserve enough registers for PLN instructions
>   SQUASH: i965/fs: Make sources and destinations interfere in 16-wide
>   SQUASH: i965/fs: Properly handle register widths in CSE
>   SQUASH: i965/fs: Properly handle register widths in register_coalesce
>   SQUASH: i965/fs: Properly handle widths in copy propagation
>   SQUASH: i965/fs: Properly handle register widths in
>     VARYING_PULL_CONSTANT_LOAD
>   SQUASH: i965/fs: Properly handle register widths and odd register
>     sizes in spilling
>   SQUASH: i965/fs: Don't waste a register on texture lookups for gen >=
>     7
>   i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
>   i965/fs: Fix a bug in register coalesce
>   i965/fs: Determine partial writes based on the destination width
>   i965/fs: Add an exec_size field to fs_inst
>   SQUASH: i965/fs: Explicitly set instruction execute size a couple of
>     places
>   SQUASH: i965/blorp: Explicitly set instruction execute sizes
>   i965/fs: Better guess the width of LOAD_PAYLOAD
>   i965/fs: Make fs_reg::effective_width take fs_inst* instead of
>     fs_visitor*
>   i965/fs: Derive force_uncompressed from instruction exec_size
>   i965/fs: Remove unneeded uses of force_uncompressed
>   i965/fs: Use instruction execution sizes to set compression state
>   i965/fs: Use instruction execution sizes instead of heuristics
>   i965/fs: Use exec_size instead of force_uncompressed in
>     dump_instruction
>   i965/fs: Use the instruction execution size directly for texture
>     generation
>   i966/fs: Add a function for getting a component of a 8 or 16-wide
>     register
>   i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
>   i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions
>   i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE
>     instruction
>   i965/fs: Add split_virtual_grfs and compute_to_mrf after
>     lower_load_payload
>   i965/fs: Use the GRF for FB writes on gen >= 7
>   SQUASH: i965/fs: Force a high register for the final FB write
>
>  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp    |   2 +-
>  src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h      |  36 +--
>  src/mesa/drivers/dri/i965/brw_eu.h                 |   6 +-
>  src/mesa/drivers/dri/i965/brw_eu_emit.c            |  16 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp               | 355
> +++++++++++++++-----
>  src/mesa/drivers/dri/i965/brw_fs.h                 |  98 +++++-
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |  14 +-
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp           |  22 +-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp     | 169 +++++-----
>  .../drivers/dri/i965/brw_fs_live_variables.cpp     |  10 +-
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 160 ++++++---
>  .../drivers/dri/i965/brw_fs_register_coalesce.cpp  |  50 ++-
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp       | 356
> +++++++++++++--------
>  src/mesa/drivers/dri/i965/brw_reg.h                |   6 +
>  .../drivers/dri/i965/brw_schedule_instructions.cpp |  15 +-
>  src/mesa/drivers/dri/i965/brw_shader.cpp           |   1 +
>  src/mesa/drivers/dri/i965/intel_screen.h           |   5 +
>  17 files changed, 904 insertions(+), 417 deletions(-)
>
> --
> 2.1.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/9cc94aef/attachment.html>