[Mesa-dev] [PATCH 00/24] i965: Scalar back-end support for SIMD32, part 3.
Francisco Jerez
currojerez at riseup.net
Sat May 28 05:56:20 UTC 2016
Jason Ekstrand <jason at jlekstrand.net> writes:
> On Thu, May 26, 2016 at 8:46 PM, Francisco Jerez <currojerez at riseup.net>
> wrote:
>
>> Even though my plan was to send the remaining changes for SIMD32 as a
>> single last series, I'm feeling too sleep-deprived to finish cleaning
>> up the rest of the series today so I'll send them in another series
>> tomorrow.
>>
>> The patches I've left out for part 4 are not strictly necessary for
>> correctness but they address some shader-db regressions caused by my
>> previous changes so they are strongly recommended. Still this series
>> should be sufficient to get SIMD32 compute shaders to the same level
>> of conformance as in SIMD8 or SIMD16 mode, which I've tested running
>> Piglit/dEQP/CTS with the INTEL_DEBUG=do32 option introduced in patch
>> 23 to force the back-end to generate 32-wide code regardless of the
>> workgroup size.
>>
>> I've set up the following git branch with series 1-3:
>>
>> https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-simd32-cs
>>
>> Patches 1-4 of this series fix flag register dataflow analysis for
>> SIMD32 mode, patches 5-11 include fixes for some common FS IR
>> infrastructure and NIR translation code, patches 12-18 fix bugs in
>> several optimization and lowering passes that cause (at least) SIMD32
>> programs to be misoptimized in some cases, patches 19-20 get the
>> register allocator working in SIMD32 mode, and patches 21-24 make some
>> finishing changes to the back-end infrastructure to get SIMD32
>> compilation hooked up for compute shaders and expose the desktop
>> ARB_compute_shader extension on all Gen7+ hardware that couldn't
>> support it currently due to the SIMD16 workgroup size limit.
>>
>
> Over-all, this looks fantastic. I made a few cosmetic comments which you
> are free to ignore or do later. 1-21 and 24 are
>
I think I've addressed most of your comments locally, except your
suggestion to fix the CMP cmod propagation bug on the VEC4 back-end,
which I think I'd rather look into next week and fix it as a follow-up.
> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
>
Thanks!
> I'd like to see 22 reworked (details on that patch) and I don't think we
> need 23 if we do the reworks suggested for 22.
> --Jason
>
>
>>
>> [PATCH 01/24] i965/fs: Define methods to calculate the flag subset read or
>> written by an fs_inst.
>> [PATCH 02/24] i965/fs: Track flag register liveness with byte granularity.
>> [PATCH 03/24] i965/fs: Keep track of flag dependencies with byte
>> granularity during scheduling.
>> [PATCH 04/24] i965/fs: Clean up remaining uses of fs_inst::reads_flag and
>> ::writes_flag.
>> [PATCH 05/24] i965/fs: Fix horiz_offset() to handle ARF and HW GRF
>> register files.
>> [PATCH 06/24] i965/fs: Fix half() to handle more exotic register files.
>> [PATCH 07/24] i965/fs: Emit fixed-width null register regardless of the
>> dispatch width.
>> [PATCH 08/24] i965/fs: Return 32 bit mask from fs_builder::sample_mask().
>> [PATCH 09/24] i965/fs: Emit fixed width memory fence opcode regardless of
>> the dispatch width.
>> [PATCH 10/24] i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE
>> instruction unnecessarily.
>> [PATCH 11/24] i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless
>> of the dispatch width.
>> [PATCH 12/24] i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32
>> runs.
>> [PATCH 13/24] i965/fs: Reset reg_offset of the original destination to
>> zero in compute_to_mrf().
>> [PATCH 14/24] i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
>> [PATCH 15/24] i965/fs: Estimate number of registers written correctly in
>> opt_register_renaming.
>> [PATCH 16/24] i965/fs: Fix cmod propagation not to propagate non-identity
>> cmod into CMP(N).
>> [PATCH 17/24] i965/fs: Fix multiple ACP interference during copy
>> propagation.
>> [PATCH 18/24] i965/fs: Don't mutate multi-component arguments in sampler
>> payload set-up.
>> [PATCH 19/24] i965/fs: Remove pre-Gen7 register allocation class
>> micro-optimization.
>> [PATCH 20/24] i965/fs: Implement SIMD32 register allocation support.
>> [PATCH 21/24] i965/fs: Extend back-end interface for limiting the shader
>> dispatch width.
>> [PATCH 22/24] i965/fs: Build 32-wide compute shader when needed.
>> [PATCH 23/24] i965: Add do32 debug option.
>> [PATCH 24/24] i965: Update compute workgroup size limit calculation for
>> SIMD32.
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160527/2f48f00b/attachment-0001.sig>
More information about the mesa-dev
mailing list