[Mesa-dev] [PATCH 1/2] i965: Make DCE set null destinations on messages with side effects.
Francisco Jerez
currojerez at riseup.net
Tue Dec 13 20:39:21 UTC 2016
Kenneth Graunke <kenneth at whitecape.org> writes:
> (Co-authored by Matt Turner.)
>
> Image atomics, for example, return a value - but the shader may not
> want to use it. We assigned a useless VGRF destination. This seemed
> harmless, but it can actually be quite harmful. The register allocator
> has to assign that VGRF to a real register. It may assign the same
> actual GRF to the destination of an instruction that follows soon after.
>
> This results in a write-after-write (WAW) dependency, and stall.
>
> A number of "Deus Ex: Mankind Divided" shaders use image atomics, but
> don't use the return value. Several of these were hitting WAW stalls
> for nearly 14,000 cycles a pop. This patch cuts one shader's cycles
> by -98.35%!
>
This will also help reduce the amount of traffic between the EU and HDC,
because [as ISTR we discussed yesterday in the office] the generator
asks the HDC not to send any return payload when it sees a null
destination region. OTOH the 14 kcycle estimate is likely to be overly
pessimistic, especially for more recent hardware, the actual number of
cycles spent in the stall could easily be two orders of magnitude lower
than the estimate -- Did you happen to test this change on the real
workload?
> Making dead code elimination null out the destination avoids this issue.
> We can drop the redundant NOP handling as well - a following hunk
> already turns instructions with null destinations and no effects into
> NOPs.
>
The code supposed to do that seems rather buggy... The code you removed
below would only NOP an instruction if it didn't write the accumulator
or flag register, but the following hunk would then go and NOP it if the
flag register wasn't live, regardless of whether the accumulator result
From the instruction was useful, and regardless of whether the
instruction had any side effects.
> We do need to special case memory barriers as they need an actual
> VGRF to function correctly, even though that registers serves only
> to create register dependencies.
>
I don't think memory barriers are the only case this could break. AFAIK
a SEND message with null destination register needs to specify a return
payload length of zero, which may or may not be valid depending on the
message... The only reason this optimization works as-is for surface
atomic messages is that the FS generator already special-cases null
destination registers [basically in order to make this optimization
possible, this codepath is currently unused] and then asks brw_eu_emit
for a surface atomic SEND message without return payload, which requires
the "return data expected" control bit to be set up accordingly in the
SFID-specific message control fields. I'm not aware of any send-like
virtual instruction currently supported by the back-end which is able to
do the same thing other than SHADER_OPCODE_TYPED_ATOMIC and
SHADER_OPCODE_UNTYPED_ATOMIC.
I suggest you define a 'can_omit_write()' predicate (either as a static
function or as a backend_instruction method) which returns true except
for non-surface atomic send-from-grf instructions, and use it below
instead of the SHADER_OPCODE_MEMORY_FENCE special casing.
> On Skylake:
>
> total instructions in shared programs: 13700090 -> 13699698 (-0.00%)
> instructions in affected programs: 13514 -> 13122 (-2.90%)
> helped: 2
> HURT: 0
>
> total cycles in shared programs: 288879704 -> 278651496 (-3.54%)
> cycles in affected programs: 15842112 -> 5613904 (-64.56%)
> helped: 19
> HURT: 3
>
> total spills in shared programs: 15040 -> 14990 (-0.33%)
> spills in affected programs: 140 -> 90 (-35.71%)
> helped: 2
> HURT: 0
>
> total fills in shared programs: 17328 -> 17264 (-0.37%)
> fills in affected programs: 228 -> 164 (-28.07%)
> helped: 2
> HURT: 0
>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
> src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> index 8a0469a..4c524a4 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp
> @@ -52,7 +52,8 @@ fs_visitor::dead_code_eliminate()
> sizeof(BITSET_WORD));
>
> foreach_inst_in_block_reverse_safe(fs_inst, inst, block) {
> - if (inst->dst.file == VGRF && !inst->has_side_effects()) {
> + if (inst->dst.file == VGRF &&
> + inst->opcode != SHADER_OPCODE_MEMORY_FENCE) {
> const unsigned var = live_intervals->var_from_reg(inst->dst);
> bool result_live = false;
>
> @@ -60,13 +61,8 @@ fs_visitor::dead_code_eliminate()
> result_live |= BITSET_TEST(live, var + i);
>
> if (!result_live) {
> + inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
> progress = true;
> -
> - if (inst->writes_accumulator || inst->flags_written()) {
> - inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type));
> - } else {
> - inst->opcode = BRW_OPCODE_NOP;
> - }
> }
> }
>
> --
> 2.10.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161213/bbe01f6e/attachment.sig>
More information about the mesa-dev
mailing list