[Mesa-dev] [PATCH 37/59] i965/fs: generalize SIMD16 interference workaround

Wed May 4 08:16:00 UTC 2016

Connor Abbott <cwabbott0 at gmail.com> writes:

> On Tue, May 3, 2016 at 6:16 PM, Francisco Jerez <currojerez at riseup.net> wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>>
>>> From: Connor Abbott <connor.w.abbott at intel.com>
>>>
>>> Work based on registers read/written instead of dispatch_width, so that
>>> the interferences are added for 64-bit sources/destinations as well.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 67 ++++++++++++++++-------
>>>  1 file changed, 48 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>>> index 2347cd5..f0e96f9 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
>>> @@ -541,6 +541,34 @@ setup_mrf_hack_interference(fs_visitor *v, struct ra_graph *g,
>>>     }
>>>  }
>>>
>>> +static unsigned
>>> +get_reg_node(const fs_reg& reg, unsigned first_payload_node)
>>> +{
>>> +   switch (reg.file) {
>>> +   case VGRF:
>>> +      return reg.nr;
>>> +   case ARF:
>>> +   case FIXED_GRF:
>>> +      return first_payload_node + reg.nr;
>>> +   case MRF:
>>> +   default:
>>> +      unreachable("unhandled register file");
>>> +   }
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +static void
>>> +add_reg_interference(struct ra_graph *g, const fs_reg& reg1,
>>> +                     const fs_reg& reg2, unsigned first_payload_node)
>>> +{
>>> +   if ((reg1.file == VGRF || reg1.file == ARF || reg1.file == FIXED_GRF) &&
>>> +       (reg2.file == VGRF || reg2.file == ARF || reg2.file == FIXED_GRF)) {
>>> +      ra_add_node_interference(g, get_reg_node(reg1, first_payload_node),
>>> +                               get_reg_node(reg2, first_payload_node));
>>> +   }
>>> +}
>>> +
>>>  bool
>>>  fs_visitor::assign_regs(bool allow_spilling)
>>>  {
>>> @@ -643,26 +671,27 @@ fs_visitor::assign_regs(bool allow_spilling)
>>>        }
>>>     }
>>>
>>> -   if (dispatch_width > 8) {
>>> -      /* In 16-wide dispatch we have an issue where a compressed
>>> -       * instruction is actually two instructions executed simultaneiously.
>>> -       * It's actually ok to have the source and destination registers be
>>> -       * the same.  In this case, each instruction over-writes its own
>>> -       * source and there's no problem.  The real problem here is if the
>>> -       * source and destination registers are off by one.  Then you can end
>>> -       * up in a scenario where the first instruction over-writes the
>>> -       * source of the second instruction.  Since the compiler doesn't know
>>> -       * about this level of granularity, we simply make the source and
>>> -       * destination interfere.
>>> -       */
>>> -      foreach_block_and_inst(block, fs_inst, inst, cfg) {
>>> -         if (inst->dst.file != VGRF)
>>> -            continue;
>>> +   /* When instructions both read/write more than a single SIMD8 register, we
>>> +    * have an issue where an instruction is actually two instructions executed
>>> +    * simultaneiously.  It's actually ok to have the source and destination
>>> +    * registers be the same.  In this case, each instruction over-writes its
>>> +    * own source and there's no problem.  The real problem here is if the
>>> +    * source and destination registers are off by one.  Then you can end up in
>>> +    * a scenario where the first instruction over-writes the source of the
>>> +    * second instruction.  Since the compiler doesn't know about this level of
>>> +    * granularity, we simply make the source and destination interfere.
>>> +    */
>>> +   foreach_block_and_inst(block, fs_inst, inst, cfg) {
>>> +      if (inst->dst.file != VGRF &&
>>> +          inst->dst.file != ARF && inst->dst.file != FIXED_GRF)
>>> +         continue;
>>>
>>> -         for (int i = 0; i < inst->sources; ++i) {
>>> -            if (inst->src[i].file == VGRF) {
>>> -               ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr);
>>> -            }
>>> +      if (inst->regs_written <= 1)
>>> +         continue;
>>> +
>>> +      for (int i = 0; i < inst->sources; ++i) {
>>> +         if (inst->regs_read(i) > 1) {
>>
>> What's the point of checking that regs_read(i) > 1?  Isn't this
>> supposedly a problem in particular when a source register is fully
>> contained inside the first (one register) half of the destination?
>
> No, it's not. It's a problem when the first half of the destination
> overlaps with the second register read by source. For example,
> something like:
>
>   add(8) g11:DF g10:DF g20:DF
>
> will be broken into:
>
>    add(4) g11:DF g10:DF g20:DF
>    add(4) g12:DF g11:DF g21:DF
>
> which obviously doesn't do what you want. Note that something with a
> source stride of 2 and a destination stride of 1, like:
>
>    mov(8) g11:UD g10<2>:UD
>
> has the same problem, so checking that regs_read(i) > 1 is both a
> necessary and sufficient condition for the problem to hold for some
> choice of physical registers.
>
It doesn't look like you're answering the question? :P Let me rephrase
it slightly: Under the assumption that this is indeed a problem for
regs_read(i) > 1, why would it *not* be a problem for regs_read(i) == 1
if there still is overlap between the destination of the second
compressed half and the single-register source?

>>
>> Either way I have some evidence suggesting that the EU works around this
>> problem in hardware on at least some generations by buffering the result
>> From the first half of any compressed instruction while the second half
>> executes -- Can you come up with a test case where this fixes a problem?
>> On what hardware?
>
> I'd be very surprised if that were the case, since to the part of the
> pipeline that handles register fetching and writeback, an instruction
> like
>
>   add(8) g11:DF g10:DF g20:DF
>
> isn't really that different from
>
>   add(16) g11:F g10:F g20:F
>
Oh?

> and both the existing workaround in this patch and a separate
> workaround in liveness analysis for uniforms already exist to prevent
> something like the second instruction from getting generated. Also, I
> wouldn't have written this patch if it hadn't fixed something :) I can
> try running piglit with this patch reverted tomorrow, if no one beats
> me to it.
>
Another explanation would be that the original fix was working around a
problem in the generator with some virtual instruction that wouldn't
handle source/destination overlap correctly -- Or that the original
problem was generation-specific.  It would definitely be good to have
some test-case reproducing the problem.

>>
>>> +            add_reg_interference(g, inst->dst, inst->src[i], first_payload_node);
>>>           }
>>>        }
>>>     }
>>> --
>>> 2.5.0
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160504/31eb104b/attachment.sig>