[Mesa-dev] [PATCH 00/25] i965: Scalar back-end support for SIMD32, part 4.

Jason Ekstrand jason at jlekstrand.net
Mon May 30 17:22:27 UTC 2016


On Sun, May 29, 2016 at 12:01 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Francisco Jerez <currojerez at riseup.net> writes:
>
> > Jason Ekstrand <jason at jlekstrand.net> writes:
> >
> >> On Fri, May 27, 2016 at 7:05 PM, Francisco Jerez <currojerez at riseup.net
> >
> >> wrote:
> >>
> >>> This fixes the few code quality regressions from the previous series
> >>> enabling SIMD32 CS codegen in the back-end -- AFAICT by the end of the
> >>> series we can finally enable GL 4.3 on all Gen8+ hardware.
> >>>
> >>> Patches 1-8 delay the SIMD lowering pass after the bulk of
> >>> optimization passes have been run, which should decrease the
> >>> compilation time of mainly SIMD32 shaders and improve the code quality
> >>> of SIMD32 shaders on all generations and shaders of any dispatch width
> >>> on older generations (up to and including IVB) that use SIMD lowering
> >>> more intensively to implement various workarounds.
> >>>
> >>> Patches 9-14 rework the SIMD lowering pass to avoid emitting the copy
> >>> instructions used to zip and unzip register regions where possible,
> >>> since the register coalesce and copy propagation passes seem to
> >>> perform rather poorly at getting rid of them in some cases.  In the
> >>> long term we'll likely want to improve the register coalesce pass
> >>> irrespective of these changes.
> >>>
> >>> Patches 15-20 improve the compute-to-mrf pass used on Gen4-6 to handle
> >>> cases where the source of a VGRF-to-MRF copy is initialized by the
> >>> shader using multiple single-GRF writes, which becomes far more common
> >>> with the additional SIMD lowering going on after this series.
> >>>
> >>> Patches 21-24 are some other assorted changes improving code quality
> >>> on older gens.
> >>>
> >>> I wanted to provide more detailed (e.g. per commit) shader-db stats
> >>> with this series, but kind of ran out of time.  Let me know if you
> >>> would like to see more evidence that any of the changes below is
> >>> improving code quality in case it's not clear from the commit alone.
> >>>
> >>> [PATCH 01/25] i965/fs: Let CSE handle logical sampler sends as
> expressions.
> >>> [PATCH 02/25] i965/fs: Allow constant propagation into logical send
> >>> sources.
> >>> [PATCH 03/25] i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to
> >>> has_side_effects().
> >>> [PATCH 04/25] i965/fs: Run SIMD and logical send lowering after the
> >>> optimization loop.
> >>> [PATCH 05/25] i965/fs: Take opt_redundant_discard_jumps out of the
> >>> optimization loop.
> >>> [PATCH 06/25] i965/fs: Fix UB list sentinel dereference in
> >>> opt_sampler_eot().
> >>> [PATCH 07/25] i965/fs: Implement opt_sampler_eot() in terms of logical
> >>> sends.
> >>
> >> [PATCH 08/25] SQUASH: i965/fs: Add basic dataflow check to
> >>> opt_sampler_eot().
> >>>
> >>
> >>
> >>> [PATCH 09/25] i965/fs: Refactor offset() into a separate function
> taking
> >>> the width as argument.
> >>> [PATCH 10/25] i965/fs: Generalize regions_overlap() from copy
> propagation
> >>> to handle non-VGRF files.
> >>> [PATCH 11/25] i965/fs: Factor out region zipping and unzipping from the
> >>> SIMD lowering pass.
> >>> [PATCH 12/25] i965/fs: Skip SIMD lowering source unzipping for regular
> >>> scalar regions.
> >>> [PATCH 13/25] i965/fs: Skip SIMD lowering destination zipping if
> possible.
> >>> [PATCH 14/25] i965/fs: Reindent emit_zip().
> >>>
> >>
> >> 9-14 Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> >>
> >>
> >>> [PATCH 15/25] i965/fs: Teach regions_overlap() about COMPR4 MRF
> regions.
> >>> [PATCH 16/25] i965/fs: Simplify and improve accuracy of
> compute_to_mrf()
> >>> by using regions_overlap().
> >>> [PATCH 17/25] i965/fs: Fix compute-to-mrf VGRF region coverage
> condition.
> >>> [PATCH 18/25] i965/fs: Refactor compute_to_mrf() to split search and
> >>> rewrite into separate loops.
> >>> [PATCH 19/25] i965/fs: Teach compute_to_mrf about the COMPR4 address
> >>> transformation.
> >>> [PATCH 20/25] i965/fs: Extend compute_to_mrf() to coalesce VGRFs
> >>> initialized by multiple single-GRF writes.
> >>> [PATCH 21/25] i965/fs: Extend remove_duplicate_mrf_writes() to handle
> >>> non-VGRF to MRF copies.
> >>>
> >>
> >> 15-21 scare me.  A lot.  They even make me think that forking the
> compiler
> >> between SNB and IVB may be a good idea. :-/  MRFs are annoying, but
> COMPR4
> >> is such a gross hack that I really want to teach as little of the
> compiler
> >> about it as possible.
> >>
> > Heh, my impression (from writing the patches) is that COMPR4 was the
> > easiest part of it, it's not entirely unlike the usual Align1 source
> > regions -- with hard-wired width and vstride.  I agree it's kind of a
> > gross hardware feature but all it took to handle it in compute-to-mrf
> > was some simple arithmetic [it would get slightly more complex if
> > compute-to-mrf attempted to handle COMPR4 LOAD_PAYLOAD instructions, but
> > I don't think it would necessarily be an insane idea].
> >
> > Honestly, what made patches 16-20 really hard to write (and maybe also
> > hard to review, I don't know), was disentangling the highly
> > special-cased and dubious dataflow logic of the compute-to-mrf pass
> > (patches 16 and 17 could qualify as bugfixes...), and that had little to
>
> Uhm, I just noticed that patch 20 is effectively a bugfix too, because
> right now the compute-to-mrf pass only looks at and rewrites the *first*
> def it encounters of the VGRF it's coalescing away, so it will end up
> corrupting the program if there was more than one def of the VGRF region
> read by the copy instruction, because the copy instruction is removed.
>
> I don't see any reason why this bug wouldn't be possible on master
> already, but it definitely causes piglit regressions on Gen4-6 in
> combination with patch 11 just because the ordering in which the zipping
> instructions are emitted happens to change.  I'm afraid we need to land
> patches 16-20 before patch 11...
>

Ok, I think the conclusion of the matter is that I don't like
compute_to_mrf...  The patches look correct and I *think* they are strictly
an improvement but compute_to_mrf looks like it will probably just fall
over if it encounters a partial write.  Your plan of making
register_coalesce handle MRFs is probably the best long-term plan.  For now,

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

I just hope we aren't opening a giant can of worms...  (The can was already
open)


> > do with COMPR4 or even with MRFs.  Compute-to-mrf is a crappy register
> > coalescing pass that can only handle a small subset of the cases where
> > one could possibly elide VGRF-to-MRF copies.
> >
> > My suspicion is that the situation would improve (both in terms of code
> > generation quality on MRF platforms and codebase maintainability) if
> > instead of forking the driver we got rid of the compute-to-mrf pass
> > altogether and used the normal register coalesce pass to coalesce VGRFs
> > into MRFs (since the one difficult part about coalescing registers is
> > working out the dataflow and variable interference, COMPR4 is just the
> > surface).
> >
> > BTW, do you have any objection against PATCH 21?  It should be fully
> > independent from the other MRF-related ones.
> >
> >> So here's the million dollar question: Do we need them? and, more
> >> importantly, do we need them now?  I didn't see anything wrong in my
> brief
> >> skimming but don't call that a review.
> >>
> > *Shrug*, nothing in this series is strictly necessary [heh, except of
> > course PATCH 25 ;)], but if we don't include them the overall shader-db
> > balance from the whole series (including parts 1-4 except for patches
> > 16-20 from this series) on SNB would be:
> >
> >    total instructions in shared programs: 5721434 -> 5755717 (0.60%)
> >    instructions in affected programs: 2402892 -> 2437175 (1.43%)
> >    helped: 1284
> >    HURT: 12148
> >
> > instead of the following (not excluding any changes):
> >
> >    total instructions in shared programs: 5721284 -> 5708369 (-0.23%)
> >    instructions in affected programs: 489340 -> 476425 (-2.64%)
> >    helped: 1398
> >    HURT: 21
> >
> >>
> >>> [PATCH 22/25] i965/fs: Fix constant combining for instructions that
> cannot
> >>> accept source mods.
> >>> [PATCH 23/25] i965/fs: Allow scalar source regions on SNB math
> >>> instructions.
> >>> [PATCH 24/25] i965/fs: Skip gen4 pre/post-send dependency workaronds
> for
> >>> the first/last block.
> >>> [PATCH 25/25] i965: Expose GL 4.3 on Gen8+.
> >>>
> >>
> >> 22-25 are Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> >>
> >
> > Thanks.
> >
> >>
> >>> _______________________________________________
> >>> mesa-dev mailing list
> >>> mesa-dev at lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160530/5351a6ec/attachment-0001.html>


More information about the mesa-dev mailing list