[Mesa-dev] [PATCH 00/24] i965: Scalar back-end support for SIMD32, part 3.
currojerez at riseup.net
Fri May 27 03:46:05 UTC 2016
Even though my plan was to send the remaining changes for SIMD32 as a
single last series, I'm feeling too sleep-deprived to finish cleaning
up the rest of the series today so I'll send them in another series
The patches I've left out for part 4 are not strictly necessary for
correctness but they address some shader-db regressions caused by my
previous changes so they are strongly recommended. Still this series
should be sufficient to get SIMD32 compute shaders to the same level
of conformance as in SIMD8 or SIMD16 mode, which I've tested running
Piglit/dEQP/CTS with the INTEL_DEBUG=do32 option introduced in patch
23 to force the back-end to generate 32-wide code regardless of the
I've set up the following git branch with series 1-3:
Patches 1-4 of this series fix flag register dataflow analysis for
SIMD32 mode, patches 5-11 include fixes for some common FS IR
infrastructure and NIR translation code, patches 12-18 fix bugs in
several optimization and lowering passes that cause (at least) SIMD32
programs to be misoptimized in some cases, patches 19-20 get the
register allocator working in SIMD32 mode, and patches 21-24 make some
finishing changes to the back-end infrastructure to get SIMD32
compilation hooked up for compute shaders and expose the desktop
ARB_compute_shader extension on all Gen7+ hardware that couldn't
support it currently due to the SIMD16 workgroup size limit.
[PATCH 01/24] i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.
[PATCH 02/24] i965/fs: Track flag register liveness with byte granularity.
[PATCH 03/24] i965/fs: Keep track of flag dependencies with byte granularity during scheduling.
[PATCH 04/24] i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
[PATCH 05/24] i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
[PATCH 06/24] i965/fs: Fix half() to handle more exotic register files.
[PATCH 07/24] i965/fs: Emit fixed-width null register regardless of the dispatch width.
[PATCH 08/24] i965/fs: Return 32 bit mask from fs_builder::sample_mask().
[PATCH 09/24] i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.
[PATCH 10/24] i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.
[PATCH 11/24] i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.
[PATCH 12/24] i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
[PATCH 13/24] i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().
[PATCH 14/24] i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
[PATCH 15/24] i965/fs: Estimate number of registers written correctly in opt_register_renaming.
[PATCH 16/24] i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
[PATCH 17/24] i965/fs: Fix multiple ACP interference during copy propagation.
[PATCH 18/24] i965/fs: Don't mutate multi-component arguments in sampler payload set-up.
[PATCH 19/24] i965/fs: Remove pre-Gen7 register allocation class micro-optimization.
[PATCH 20/24] i965/fs: Implement SIMD32 register allocation support.
[PATCH 21/24] i965/fs: Extend back-end interface for limiting the shader dispatch width.
[PATCH 22/24] i965/fs: Build 32-wide compute shader when needed.
[PATCH 23/24] i965: Add do32 debug option.
[PATCH 24/24] i965: Update compute workgroup size limit calculation for SIMD32.
More information about the mesa-dev