[Mesa-dev] [PATCH 5/5] i965/vec4: Propagate swizzles correctly during copy propagation.

Matt Turner mattst88 at gmail.com
Mon Feb 29 23:19:09 UTC 2016


On Fri, Feb 26, 2016 at 10:02 PM, Francisco Jerez <currojerez at riseup.net> wrote:
> This simplifies the code that iterates over the per-component values
> found in the matching copy_entry struct and checks whether the
> register regions that were copied to each component are similar enough
> to be treated as a single (reswizzled) value which can be propagated
> into the current instruction.
>
> Aside from being scattered between opt_copy_propagation(),
> try_copy_propagate(), and try_constant_propagate(), what I found
> terribly confusing about the preexisting logic was that
> opt_copy_propagation() tried to reorder the array of values according
> to the swizzle of the instruction source, which meant one would have
> had to invert the reordering applied at the top level in order to find
> out which component to take from each value (we were just taking the
> i-th component from the i-th value, which is not correct in general).
> The saturate mask was also being swizzled incorrectly.
>
> This consolidates the logic for matching multiple components of a
> copy_entry into a single function which returns the result as a
> regular src_reg on success, as if the copy had been performed with a
> single MOV instruction copying all components of the src_reg into the
> destination.
>
> Fixes several ARB_vertex_program MOV test-cases from:
>  https://cgit.freedesktop.org/~kwg/piglit/log/?h=arb_program

Thanks a ton for fixing this. I tried a few times before to simply
understand how the code worked -- even before I knew there was a bug
-- and each time gave up.

I still don't fully understand the issue. Too much or too little
swizzling. Doesn't matter. Fixes the bug and is a net reduction in
LOC.

Nice work. A few code formatting comments below, but this patch gets an

Acked-by: Matt Turner <mattst88 at gmail.com>

> ---
>  .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 123 ++++++++++-----------
>  1 file changed, 57 insertions(+), 66 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index b4a150a..fc8b721 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -85,21 +85,65 @@ is_logic_op(enum opcode opcode)
>             opcode == BRW_OPCODE_NOT);
>  }
>
> +/**
> + * Get the origin of a copy as a single register if all components present in
> + * the given readmask originate from the same register and have compatible
> + * regions, otherwise return a BAD_FILE register.
> + */
> +static src_reg
> +get_copy_value(const copy_entry &entry, unsigned readmask)
> +{
> +   unsigned swz[4] = {};
> +   src_reg value;
> +
> +   for (unsigned i = 0; i < 4; i++) {
> +      if (readmask & (1 << i)) {
> +         if (entry.value[i]) {
> +            src_reg src = *entry.value[i];
> +
> +            if (src.file == IMM) {
> +               swz[i] = i;
> +            } else {
> +               swz[i] = BRW_GET_SWZ(src.swizzle, i);
> +               /* Overwrite the original swizzle so the src_reg::equals call
> +                * below doesn't care about it, the correct swizzle will be
> +                * calculated once the swizzles of all components are known.
> +                */
> +               src.swizzle = BRW_SWIZZLE_XYZW;
> +            }
> +
> +            if (value.file == BAD_FILE)
> +               value = src;
> +
> +            else if (!value.equals(src))
> +               return src_reg();

Parentheses required in nested if statements. With that, there's no
need for the blank lines after them.

> +
> +         } else {
> +            return src_reg();
> +         }
> +      }
> +   }
> +
> +   return swizzle(value, brw_compose_swizzle(
> +                     brw_swizzle_for_mask(readmask),
> +                     BRW_SWIZZLE4(swz[0], swz[1], swz[2], swz[3])));

Lets wrap this as

   return swizzle(value,
                  brw_compose_swizzle(brw_swizzle_for_mask(readmask),
                                      BRW_SWIZZLE4(swz[0], swz[1],
                                                   swz[2], swz[3])));

(GMail might mess that up, so just align each argument with the open
parenthesis)

> +}
> +
>  static bool
>  try_constant_propagate(const struct brw_device_info *devinfo,
>                         vec4_instruction *inst,
> -                       int arg, struct copy_entry *entry)
> +                       int arg, const copy_entry *entry)
>  {
>     /* For constant propagation, we only handle the same constant
>      * across all 4 channels.  Some day, we should handle the 8-bit
>      * float vector format, which would let us constant propagate
>      * vectors better.
> +    * We could be more aggressive here -- some channels might not get used
> +    * based on the destination writemask.
>      */
> -   src_reg value = *entry->value[0];
> -   for (int i = 1; i < 4; i++) {
> -      if (!value.equals(*entry->value[i]))
> -        return false;
> -   }
> +   src_reg value = get_copy_value(
> +      *entry,
> +      brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle, WRITEMASK_XYZW));

Similarly here:

   src_reg value =
      get_copy_value(*entry,
                     brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle,
                                                   WRITEMASK_XYZW));


>
>     if (value.file != IMM)
>        return false;
> @@ -238,38 +282,14 @@ try_constant_propagate(const struct brw_device_info *devinfo,
>  static bool
>  try_copy_propagate(const struct brw_device_info *devinfo,
>                     vec4_instruction *inst, int arg,
> -                   struct copy_entry *entry, int attributes_per_reg)
> +                   const copy_entry *entry, int attributes_per_reg)
>  {
>     /* Build up the value we are propagating as if it were the source of a
>      * single MOV
>      */
> -   /* For constant propagation, we only handle the same constant
> -    * across all 4 channels.  Some day, we should handle the 8-bit
> -    * float vector format, which would let us constant propagate
> -    * vectors better.
> -    */
> -   src_reg value = *entry->value[0];
> -   for (int i = 1; i < 4; i++) {
> -      /* This is equals() except we don't care about the swizzle. */
> -      if (value.file != entry->value[i]->file ||
> -          value.nr != entry->value[i]->nr ||
> -         value.reg_offset != entry->value[i]->reg_offset ||
> -         value.type != entry->value[i]->type ||
> -         value.negate != entry->value[i]->negate ||
> -         value.abs != entry->value[i]->abs) {
> -        return false;
> -      }
> -   }
> -
> -   /* Compute the swizzle of the original register by swizzling the
> -    * component loaded from each value according to the swizzle of
> -    * operand we're going to change.
> -    */
> -   int s[4];
> -   for (int i = 0; i < 4; i++) {
> -      s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i);
> -   }
> -   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
> +   src_reg value = get_copy_value(
> +      *entry,
> +      brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle, WRITEMASK_XYZW));

Same thing here.

>
>     /* Check that we can propagate that value */
>     if (value.file != UNIFORM &&
> @@ -418,38 +438,9 @@ vec4_visitor::opt_copy_propagation(bool do_constant_prop)
>           if (inst->regs_read(i) != 1)
>              continue;
>
> -         int reg = (alloc.offsets[inst->src[i].nr] +
> -                   inst->src[i].reg_offset);
> -
> -        /* Find the regs that each swizzle component came from.
> -         */
> -         struct copy_entry entry;
> -         memset(&entry, 0, sizeof(copy_entry));
> -        int c;
> -        for (c = 0; c < 4; c++) {
> -            int channel = BRW_GET_SWZ(inst->src[i].swizzle, c);
> -            entry.value[c] = entries[reg].value[channel];
> -
> -           /* If there's no available copy for this channel, bail.
> -            * We could be more aggressive here -- some channels might
> -            * not get used based on the destination writemask.
> -            */
> -           if (!entry.value[c])
> -              break;
> -
> -            entry.saturatemask |=
> -               (entries[reg].saturatemask & (1 << channel) ? 1 : 0) << c;
> -
> -           /* We'll only be able to copy propagate if the sources are
> -            * all from the same file -- there's no ability to swizzle
> -            * 0 or 1 constants in with source registers like in i915.
> -            */
> -           if (c > 0 && entry.value[c - 1]->file != entry.value[c]->file)
> -              break;
> -        }
> -
> -        if (c != 4)
> -           continue;
> +         const unsigned reg = (alloc.offsets[inst->src[i].nr] +
> +                                inst->src[i].reg_offset);
> +         const copy_entry &entry = entries[reg];
>
>           if (do_constant_prop && try_constant_propagate(devinfo, inst, i, &entry))
>              progress = true;
> --
> 2.7.0
>


More information about the mesa-dev mailing list