[Mesa-dev] [PATCH 00/25] i965: Scalar back-end support for SIMD32, part 4.

Sat May 28 02:05:41 UTC 2016

This fixes the few code quality regressions from the previous series
enabling SIMD32 CS codegen in the back-end -- AFAICT by the end of the
series we can finally enable GL 4.3 on all Gen8+ hardware.

Patches 1-8 delay the SIMD lowering pass after the bulk of
optimization passes have been run, which should decrease the
compilation time of mainly SIMD32 shaders and improve the code quality
of SIMD32 shaders on all generations and shaders of any dispatch width
on older generations (up to and including IVB) that use SIMD lowering
more intensively to implement various workarounds.

Patches 9-14 rework the SIMD lowering pass to avoid emitting the copy
instructions used to zip and unzip register regions where possible,
since the register coalesce and copy propagation passes seem to
perform rather poorly at getting rid of them in some cases.  In the
long term we'll likely want to improve the register coalesce pass
irrespective of these changes.

Patches 15-20 improve the compute-to-mrf pass used on Gen4-6 to handle
cases where the source of a VGRF-to-MRF copy is initialized by the
shader using multiple single-GRF writes, which becomes far more common
with the additional SIMD lowering going on after this series.

Patches 21-24 are some other assorted changes improving code quality
on older gens.

I wanted to provide more detailed (e.g. per commit) shader-db stats
with this series, but kind of ran out of time.  Let me know if you
would like to see more evidence that any of the changes below is
improving code quality in case it's not clear from the commit alone.

[PATCH 01/25] i965/fs: Let CSE handle logical sampler sends as expressions.
[PATCH 02/25] i965/fs: Allow constant propagation into logical send sources.
[PATCH 03/25] i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects().
[PATCH 04/25] i965/fs: Run SIMD and logical send lowering after the optimization loop.
[PATCH 05/25] i965/fs: Take opt_redundant_discard_jumps out of the optimization loop.
[PATCH 06/25] i965/fs: Fix UB list sentinel dereference in opt_sampler_eot().
[PATCH 07/25] i965/fs: Implement opt_sampler_eot() in terms of logical sends.
[PATCH 08/25] SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().
[PATCH 09/25] i965/fs: Refactor offset() into a separate function taking the width as argument.
[PATCH 10/25] i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files.
[PATCH 11/25] i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass.
[PATCH 12/25] i965/fs: Skip SIMD lowering source unzipping for regular scalar regions.
[PATCH 13/25] i965/fs: Skip SIMD lowering destination zipping if possible.
[PATCH 14/25] i965/fs: Reindent emit_zip().
[PATCH 15/25] i965/fs: Teach regions_overlap() about COMPR4 MRF regions.
[PATCH 16/25] i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap().
[PATCH 17/25] i965/fs: Fix compute-to-mrf VGRF region coverage condition.
[PATCH 18/25] i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops.
[PATCH 19/25] i965/fs: Teach compute_to_mrf about the COMPR4 address transformation.
[PATCH 20/25] i965/fs: Extend compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes.
[PATCH 21/25] i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies.
[PATCH 22/25] i965/fs: Fix constant combining for instructions that cannot accept source mods.
[PATCH 23/25] i965/fs: Allow scalar source regions on SNB math instructions.
[PATCH 24/25] i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block.
[PATCH 25/25] i965: Expose GL 4.3 on Gen8+.