[Mesa-dev] [PATCH 2/2] i965: Make DCE set null destinations on messages with side effects.
Kenneth Graunke
kenneth at whitecape.org
Tue Jan 17 20:40:24 UTC 2017
On Tuesday, January 17, 2017 11:40:33 AM PST Francisco Jerez wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
>
> > (Co-authored by Matt Turner.)
> >
> > Image atomics, for example, return a value - but the shader may not
> > want to use it. We assigned a useless VGRF destination. This seemed
> > harmless, but it can actually be quite harmful. The register allocator
> > has to assign that VGRF to a real register. It may assign the same
> > actual GRF to the destination of an instruction that follows soon after.
> >
> > This results in a write-after-write (WAW) dependency, and stall.
> >
> > A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
> > don't use the return value. Several of these were hitting WAW stalls
> > for nearly 14,000 (poorly estimated) cycles a pop. Making dead code
> > elimination null out the destination avoids this issue.
> >
> > This patch cuts one shader's estimated cycles by -98.39%! Removing the
> > message response should also help with data cluster bandwidth.
> >
> > On Skylake:
> >
> > total instructions in shared programs: 13366907 -> 13363051 (-0.03%)
> > instructions in affected programs: 49635 -> 45779 (-7.77%)
> > helped: 133
> > HURT: 0
> >
> > total cycles in shared programs: 255433388 -> 248081818 (-2.88%)
> > cycles in affected programs: 12370702 -> 5019132 (-59.43%)
> > helped: 100
> > HURT: 24
> >
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> > .../dri/i965/brw_fs_dead_code_eliminate.cpp | 56 ++++++++++++++++------
> > 1 file changed, 41 insertions(+), 15 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > index 930dc733b45..885ae2638a8 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> > @@ -34,6 +34,43 @@
> > * yet in the tail end of this block.
> > */
> >
> > +static bool
> > +can_eliminate(const fs_inst *inst, BITSET_WORD *flag_live)
> > +{
> > + return inst->opcode != BRW_OPCODE_IF &&
> > + inst->opcode != BRW_OPCODE_WHILE &&
>
> Isn't this missing a bunch of other control flow instructions that
> shouldn't be eliminated? What's going on here, are they also being
> DCE'ed? Shouldn't this be !inst->is_control_flow()?
That would make more sense. Maybe Matt knows if there's a reason.
I'm just moving code around here. I can write a patch to change that
first, though...
> > + !inst->has_side_effects() &&
> > + !(flag_live[0] & inst->flags_written()) &&
> > + !inst->writes_accumulator;
> > +}
> > +
> > +static bool
> > +can_omit_write(const fs_inst *inst, BITSET_WORD *flag_live)
> > +{
> > + switch (inst->opcode) {
> > + case SHADER_OPCODE_UNTYPED_ATOMIC:
> > + case SHADER_OPCODE_UNTYPED_ATOMIC_LOGICAL:
> > + case SHADER_OPCODE_TYPED_ATOMIC:
> > + case SHADER_OPCODE_TYPED_ATOMIC_LOGICAL:
> > + return true;
> > + default:
> > + /* If we're going to eliminate the instruction entirely, omitting the
> > + * write is always safe.
> > + */
> > + if (can_eliminate(inst, flag_live))
> > + return true;
>
> I think it would be cleaner to make this a predicate of the instruction
> alone meaning 'can the instruction accept a null register as
> destination?' (which is independent of the context it occurs in, and
> can_eliminate() doesn't have any influence on, it's just that we're okay
> with the pass below putting invalid instructions in the program as long
> as we have the guarantee that they're going to be removed later on), and
> do 'can_omit_write(inst) || can_eliminate(inst, flag_live)' below.
Yeah, that does seem nicer. I'll make that change.
> > +
> > + /* We can eliminate the destination write for ordinary instructions,
> > + * but not most SENDs.
> > + */
> > + if (inst->opcode < 128 && inst->mlen == 0)
> > + return true;
> > +
>
> This is going to miss ALU-like virtual instructions, but I guess they
> could be enumerated in the 'return true' case above together with
> surface atomics. [Not saying that you necessarily need to do that
> now ;)]
Right. I figured this was a good start, and we can enumerate our own
crazy opcodes above as we see fit. If you've got a better idea...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170117/66c13ae7/attachment.sig>
More information about the mesa-dev
mailing list