[Mesa-dev] [PATCH 00/41] Welcome back Matt!
Jason Ekstrand
jason at jlekstrand.net
Sat Sep 20 10:25:55 PDT 2014
One more quick note. If you find it nicer, the whole thing can be found
here:
http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1
On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> This series does a bunch of refactoring of the i965 fs backend IR to add
> concepts of register width and instruction execution size. There's more to
> be done yet, but this gets us most of the way there. It also removes the
> assumption that scalar values are always 1 register in SIMD8 and 2
> registers in SIMD16. In particular, we get the following:
>
> 1) No more assumption about everything being 1 register. This allows us
> to allocate odd numbers of registers in SIMD16 which is needed for some
> payloads. Also, it should make implementing fp64 much easier because
> we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16.
> There's a little more work to be don there, but this should take care
> of a lot of it.
>
> 2) We can now do other instruction widths with relative ease. The
> compiler now detects, based on register widths, the execution size of
> the instruction and passes it down to the generator. One example of
> this is the patches in this series for UNTYPED_ATOMIC and
> UNTYPED_SURFACE_READ where part of setting up the payload is to do an
> 8-wide move to fill a register with 0 and then a 1-wide move to set one
> particular component. We can now simply do this at the fs level and it
> will be get translated down to the correct assembly and properly
> handled by the compiler optimizations. There is more work to be done
> here at the generator level, but this series is already long enough
>
> 3) Thanks to the above mentioned things, we can easily do send from GRF
> for FB writes. One of the major blockers here before was that the
> beginning of the FB write message was anywhere between 0 and 4
> registers regardless of whether you are in SIMD8 or SIMD16. Due to the
> implicit register doubling in SIMD16, it would have been a real pain to
> implement this properly. Now, it's trivial.
>
> I could go on about other changes, but those are the major ones.
>
> The requisite Shader DB results:
>
> total instructions in shared programs: 4999994 -> 4971746 (-0.56%)
> instructions in affected programs: 959392 -> 931144 (-2.94%)
> GAINED: 138
> LOST: 71
>
> There are some shaders that are hurt by 1 or 2 instructions. It could
> simply be send-from-grf, but prior to this last rebase, I don't remember
> there being any hurt programs. I'm going to look into it.
>
> Regarding Piglit:
>
> * On HSW, Every commit except for ones immediately followed by something
> labled SQUASH pass. (Except for glsl-routing and timestamp-get which
> are flaky).
>
> * On SNB and Gen4, the end of the series along with important
> intermediate points, such as changing GEN5 texturing or varying pull
> constant loads, pass.
>
> I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4
> code which I have since removed. I'll try to get that working and added
> back in later. That said, that's an optimization and not required, so we
> can leave it for now.
>
> Happy Reviewing!
> --Jason Ekstrand
>
> Jason Ekstrand (41):
> i965/brw_reg: Add a firsthalf function and use it in the generator
> i965/fs: A little harmless refactoring of register_coalesce
> i965/fs: Add a concept of a width to fs_reg
> i965/fs: Make half() divide the register width by 2 and use it more
> i965/fs: Handle printing of registers better.
> i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode
> SQUASH: i965/fs: Use the register width when applying offsets
> SQUASH: i965/fs: Change regs_read to be in hardware registers
> SQUASH: i965/fs: Change regs_written to be actual hardware registers
> SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD
> SQUASH: i965/fs: Handle register widths in demote_pull_constants
> SQUASH: i965/fs: Get rid of implicit register doubling in the
> allocator
> SQUASH: i965/fs: Reserve enough registers for PLN instructions
> SQUASH: i965/fs: Make sources and destinations interfere in 16-wide
> SQUASH: i965/fs: Properly handle register widths in CSE
> SQUASH: i965/fs: Properly handle register widths in register_coalesce
> SQUASH: i965/fs: Properly handle widths in copy propagation
> SQUASH: i965/fs: Properly handle register widths in
> VARYING_PULL_CONSTANT_LOAD
> SQUASH: i965/fs: Properly handle register widths and odd register
> sizes in spilling
> SQUASH: i965/fs: Don't waste a register on texture lookups for gen >=
> 7
> i965/fs: Rework GEN5 texturing code to use fs_reg and offset()
> i965/fs: Fix a bug in register coalesce
> i965/fs: Determine partial writes based on the destination width
> i965/fs: Add an exec_size field to fs_inst
> SQUASH: i965/fs: Explicitly set instruction execute size a couple of
> places
> SQUASH: i965/blorp: Explicitly set instruction execute sizes
> i965/fs: Better guess the width of LOAD_PAYLOAD
> i965/fs: Make fs_reg::effective_width take fs_inst* instead of
> fs_visitor*
> i965/fs: Derive force_uncompressed from instruction exec_size
> i965/fs: Remove unneeded uses of force_uncompressed
> i965/fs: Use instruction execution sizes to set compression state
> i965/fs: Use instruction execution sizes instead of heuristics
> i965/fs: Use exec_size instead of force_uncompressed in
> dump_instruction
> i965/fs: Use the instruction execution size directly for texture
> generation
> i966/fs: Add a function for getting a component of a 8 or 16-wide
> register
> i965/fs: Use the GRF for UNTYPED_ATOMIC instructions
> i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions
> i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE
> instruction
> i965/fs: Add split_virtual_grfs and compute_to_mrf after
> lower_load_payload
> i965/fs: Use the GRF for FB writes on gen >= 7
> SQUASH: i965/fs: Force a high register for the final FB write
>
> src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +-
> src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h | 36 +--
> src/mesa/drivers/dri/i965/brw_eu.h | 6 +-
> src/mesa/drivers/dri/i965/brw_eu_emit.c | 16 +-
> src/mesa/drivers/dri/i965/brw_fs.cpp | 355
> +++++++++++++++-----
> src/mesa/drivers/dri/i965/brw_fs.h | 98 +++++-
> .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 +-
> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 22 +-
> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 169 +++++-----
> .../drivers/dri/i965/brw_fs_live_variables.cpp | 10 +-
> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 160 ++++++---
> .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 50 ++-
> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 356
> +++++++++++++--------
> src/mesa/drivers/dri/i965/brw_reg.h | 6 +
> .../drivers/dri/i965/brw_schedule_instructions.cpp | 15 +-
> src/mesa/drivers/dri/i965/brw_shader.cpp | 1 +
> src/mesa/drivers/dri/i965/intel_screen.h | 5 +
> 17 files changed, 904 insertions(+), 417 deletions(-)
>
> --
> 2.1.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/9cc94aef/attachment.html>
More information about the mesa-dev
mailing list