[Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

Francisco Jerez currojerez at riseup.net
Wed May 11 18:51:26 UTC 2016


Iago Toral <itoral at igalia.com> writes:

> On Tue, 2016-05-10 at 16:53 -0700, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>> > From: Iago Toral Quiroga <itoral at igalia.com>
>> >
>> > We were not accounting for reg_suboffset in the check for the start
>> > of the region. This meant that would allow copy-propagation even if
>> > the dst wrote to sub_regoffset 4 and our source read from
>> > sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> > since there we use reg_suboffset to select the high 32-bit of a double.
>> >
>> I don't think this paragraph is accurate, copy instructions with
>> non-zero destination subreg offset are currently considered partial
>> writes and should never have been added to the ACP hash table in the
>> first place.
>
> Right, I think I wrote this patch before the one where I fixed
> is_partial_write() to consider any write to subreg_offset > 0 partial. 
>
In practice they would be considered partial writes already, but the
reason is somewhat obscure -- Writes with subreg_offset != 0 would
necessarily fall into three categories:
 - Strided writes (which are considered partial explicitly).
 - Contiguous writes which write less than one GRF worth of data (which
   are considered partial explicitly)
 - Contiguous writes which straddle two registers (which have been
   severely limited by the hardware historically).

Even though it's unlikely to have fixed pre-existing bugs, for the sake
of sanity I'm glad that you fixed is_partial_write() to handle the
latter case explicitly (thanks!), especially since the hardware has
slowly been lifting restrictions on the ways you can straddle multiple
registers: On Gen7 the stupid decompression behaviour that simply shifts
the second decompressed portion of the instruction by one GRF would kill
you.  Gen8 is slightly less stupid and shifts the second decompressed
portion by ExecSize/2 components which allows it to handle a subset of
straddled writes (only the cases where the written components are
balanced between the two registers).  AFAIK Gen9 should be the first to
support fully unrestricted unbalanced writes.

>> > Also, fs_reg::regs_read() already takes the stride into account, so we
>> > should not multiply its result by the stride again. This was making
>> > copy-propagation fail to copy-propagate cases that would otherwise be
>> > safe to copy-propagate. Again, this was observed in fp64 code, since
>> > there we use stride > 1 often.
>> >
>> > Incidentally, these fixes open up more possibilities for copy propagation
>> > which uncovered new bugs in copy-propagation. The folowing patches address
>> > each of these new issues.
>> 
>> Oh man, that sucks...
>> 
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp    | 21 +++++++++++++--------
>> >  1 file changed, 13 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index 5fae10f..23df877 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
>> >     return true;
>> >  }
>> >  
>> > +static inline bool
>> > +region_match(fs_reg src, unsigned regs_read,
>> > +             fs_reg dst, unsigned regs_written)
>> 
>> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
>> personally wouldn't mind 'region_match' but
>> 'write_region_contains_read_region' sounds a bit too long for my taste).
>> 
>> > +{
>> > +   return src.reg_offset >= dst.reg_offset &&
>> > +          (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
>> > +          src.subreg_offset >= dst.subreg_offset;
>> 
>> This works under the assumption that src.subreg_offset is strictly less
>> than the reg_offset unit -- Which *should* be the case unless we've
>> messed up that restriction in some place (we have in the past :P).  To
>> be on the safe side you could do something like following, if you like:
>> 
>> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
>> |           dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
>> |          src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> I understand that even if we discard writes with dst.subreg_offset > 0,
> you still want the subreg_offset check here to be safe exactly in that
> scenario (since we would not need this for the case I originally wrote
> it for).
>
>> With the above taken into account:
>> 
>> Reviewed-by: Francisco Jerez <currojerez at riseup.net>
>
> Thanks!
>
>> > +}
>> > +
>> >  bool
>> >  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >  {
>> > @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>> >     /* Bail if inst is reading a range that isn't contained in the range
>> >      * that entry is writing.
>> >      */
>> > -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
>> > -       (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
>> > -        inst->regs_read(arg) * inst->src[arg].stride * 32) >
>> > -       (entry->dst.reg_offset + entry->regs_written) * 32)
>> > +   if (!region_match(inst->src[arg], inst->regs_read(arg),
>> > +                     entry->dst, entry->regs_written))
>> >        return false;
>> >  
>> >     /* we can't generally copy-propagate UD negations because we
>> > @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
>> >        /* Bail if inst is reading a range that isn't contained in the range
>> >         * that entry is writing.
>> >         */
>> > -      if (inst->src[i].reg_offset < entry->dst.reg_offset ||
>> > -          (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
>> > -           inst->regs_read(i) * inst->src[i].stride * 32) >
>> > -          (entry->dst.reg_offset + entry->regs_written) * 32)
>> > +      if (!region_match(inst->src[i], inst->regs_read(i),
>> > +                        entry->dst, entry->regs_written))
>> >           continue;
>> >  
>> >        fs_reg val = entry->src;
>> > -- 
>> > 2.5.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160511/f5197082/attachment.sig>


More information about the mesa-dev mailing list