[Mesa-dev] [PATCH 2/2] i965: Make DCE set null destinations on messages with side effects.

Kenneth Graunke kenneth at whitecape.org
Tue Jan 17 20:40:24 UTC 2017


On Tuesday, January 17, 2017 11:40:33 AM PST Francisco Jerez wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
> 
> > (Co-authored by Matt Turner.)
> >
> > Image atomics, for example, return a value - but the shader may not
> > want to use it.  We assigned a useless VGRF destination.  This seemed
> > harmless, but it can actually be quite harmful.  The register allocator
> > has to assign that VGRF to a real register.  It may assign the same
> > actual GRF to the destination of an instruction that follows soon after.
> >
> > This results in a write-after-write (WAW) dependency, and stall.
> >
> > A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
> > don't use the return value.  Several of these were hitting WAW stalls
> > for nearly 14,000 (poorly estimated) cycles a pop.  Making dead code
> > elimination null out the destination avoids this issue.
> >
> > This patch cuts one shader's estimated cycles by -98.39%!  Removing the
> > message response should also help with data cluster bandwidth.
> >
> > On Skylake:
> >
> > total instructions in shared programs: 13366907 -> 13363051 (-0.03%)
> > instructions in affected programs: 49635 -> 45779 (-7.77%)
> > helped: 133
> > HURT: 0
> >
> > total cycles in shared programs: 255433388 -> 248081818 (-2.88%)
> > cycles in affected programs: 12370702 -> 5019132 (-59.43%)
> > helped: 100
> > HURT: 24
> >
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  .../dri/i965/brw_fs_dead_code_eliminate.cpp        | 56 ++++++++++++++++------
> >  1 file changed, 41 insertions(+), 15 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > index 930dc733b45..885ae2638a8 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > @@ -34,6 +34,43 @@
> >   * yet in the tail end of this block.
> >   */
> >  
> > +static bool
> > +can_eliminate(const fs_inst *inst, BITSET_WORD *flag_live)
> > +{
> > +    return inst->opcode != BRW_OPCODE_IF &&
> > +           inst->opcode != BRW_OPCODE_WHILE &&
> 
> Isn't this missing a bunch of other control flow instructions that
> shouldn't be eliminated?  What's going on here, are they also being
> DCE'ed?  Shouldn't this be !inst->is_control_flow()?

That would make more sense.  Maybe Matt knows if there's a reason.
I'm just moving code around here.  I can write a patch to change that
first, though...

> > +           !inst->has_side_effects() &&
> > +           !(flag_live[0] & inst->flags_written()) &&
> > +           !inst->writes_accumulator;
> > +}
> > +
> > +static bool
> > +can_omit_write(const fs_inst *inst, BITSET_WORD *flag_live)
> > +{
> > +   switch (inst->opcode) {
> > +   case SHADER_OPCODE_UNTYPED_ATOMIC:
> > +   case SHADER_OPCODE_UNTYPED_ATOMIC_LOGICAL:
> > +   case SHADER_OPCODE_TYPED_ATOMIC:
> > +   case SHADER_OPCODE_TYPED_ATOMIC_LOGICAL:
> > +      return true;
> > +   default:
> > +      /* If we're going to eliminate the instruction entirely, omitting the
> > +       * write is always safe.
> > +       */
> > +      if (can_eliminate(inst, flag_live))
> > +         return true;
> 
> I think it would be cleaner to make this a predicate of the instruction
> alone meaning 'can the instruction accept a null register as
> destination?' (which is independent of the context it occurs in, and
> can_eliminate() doesn't have any influence on, it's just that we're okay
> with the pass below putting invalid instructions in the program as long
> as we have the guarantee that they're going to be removed later on), and
> do 'can_omit_write(inst) || can_eliminate(inst, flag_live)' below.

Yeah, that does seem nicer.  I'll make that change.

> > +
> > +      /* We can eliminate the destination write for ordinary instructions,
> > +       * but not most SENDs.
> > +       */
> > +      if (inst->opcode < 128 && inst->mlen == 0)
> > +         return true;
> > +
> 
> This is going to miss ALU-like virtual instructions, but I guess they
> could be enumerated in the 'return true' case above together with
> surface atomics. [Not saying that you necessarily need to do that
> now ;)]

Right.  I figured this was a good start, and we can enumerate our own
crazy opcodes above as we see fit.  If you've got a better idea...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170117/66c13ae7/attachment.sig>


More information about the mesa-dev mailing list