[Mesa-dev] [PATCH v3 07/24] i965/fs: generalize the legalization d2x pass

Francisco Jerez currojerez at riseup.net
Fri Mar 3 23:19:27 UTC 2017


Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> Add support to SEL instruction and add an assert to detect unsupported
> instructions than do d2x conversions.
>
> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> ---
>
> Curro, this patch legalizes SEL instruction too. If other optimizations
> modify later any SEL's (or any other instruction's) destination type
> (hence, producing a non-lowered d2x conversion), we can call it again
> around the end of fs_visitor::optimize(). Possibly together with
> lower_simd_width() just in case it was added later.
>

This sounds rather scary...  How do you make sure that this doesn't lead
to an infinite legalization-optimization loop in which copy propagation
reverses the effect of lower_d2x making double conversions illegal
again?  If you do already, why do you need to run lower_d2x multiple
times?  Wouldn't it be sufficient to run it once near the end of
optimize(), and then re-run copy propagation and possibly DCE?

> For that reason there is the inst->dst.stride > 1 condition in the
> test. This detects if either we emitted a strided destination in
> purpose or it was as a result of a previous lower_d2x run, we don't
> want to lowered it.
>
The problem with this is that if you ended up with dst.stride > 1 due to
different fields of the same scalar quantity being defined by two
separate instructions (e.g. by using subscript(dst, ..., i)), you *need*
to apply the lowering pass regardless, because otherwise the second
instruction will corrupt the data written by the first instruction.

> However, as I have not hit that case yet, I prefer to wait for your
> opinion. What do you think?
>
>
>  src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp | 57 ++++++++++++++++++--------
>  1 file changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
> index a2db1154615..330f2552929 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_lower_d2x.cpp
> @@ -33,17 +33,9 @@ fs_visitor::lower_d2x()
>     bool progress = false;
>  
>     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> -      if (inst->opcode != BRW_OPCODE_MOV)
> -         continue;
> -
> -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
> -          inst->dst.type != BRW_REGISTER_TYPE_D &&
> -          inst->dst.type != BRW_REGISTER_TYPE_UD)
> -         continue;
> -
> -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
> -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
> -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
> +      if (get_exec_type_size(inst) != 8 ||
> +          type_sz(inst->dst.type) >= get_exec_type_size(inst) ||

Note that some type conversion restrictions apply even if the execution
type is single-precision, and even if the destination type size is not
less than the execution type, e.g. according to the hardware docs SEL
doesn't support F->UD or F->DF conversions which the condition above
would consider okay.

> +          inst->dst.stride > 1)
>           continue;
>  
>        assert(inst->dst.file == VGRF);
> @@ -61,13 +53,46 @@ fs_visitor::lower_d2x()
>         * So we need to allocate a temporary that's two registers, and then do
>         * a strided MOV to get the lower DWord of every Qword that has the
>         * result.
> +       *
> +       * This pass legalizes all the DF conversions to narrower types.
>         */
> -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
> -      fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
> -      ibld.MOV(strided_temp, inst->src[0]);
> -      ibld.MOV(dst, strided_temp);
> +      switch (inst->opcode) {

I suggest you refactor this into a helper function 'bool
supports_type_conversion(inst, dst_type, exec_type)' that returns false
for SEL and likely other things.  It might be a useful thing to have in
other places, e.g. for late optimization passes like copy propagation
where we need to make sure that no additional illegal conversions are
introduced.  If the value returned is false you'd do what you have below
for the SEL instruction, if it's true you'd do nothing unless the
instruction is double-precision and the destination type is smaller than
the execution type, in which case you'd do what you have below for
MOV/MOV_INDIRECT.

> +      case SHADER_OPCODE_MOV_INDIRECT:
> +      case BRW_OPCODE_MOV: {
> +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
> +         fs_reg strided_temp = subscript(temp, inst->dst.type, 0);
> +         /* We clone the original instruction as we are going to modify it
> +          * and emit another one after it.
> +          */
> +         fs_inst *strided_inst = new(ibld.shader->mem_ctx) fs_inst(*inst);

Why don't you just modify the original instruction instead of cloning
it, modifying the clone, and then removing the original?

> +         strided_inst->dst = strided_temp;
> +         /* As it is an strided destination, we write n-times more
> +          * being n the size difference between source and destination types.
> +          */
> +         strided_inst->size_written *= (type_sz(inst->src[0].type) / type_sz(inst->dst.type));
> +         ibld.emit(strided_inst);
> +         ibld.MOV(dst, strided_temp);
> +         /* Remove original instruction, now that is superseded. */
> +         inst->remove(block);
> +         break;
> +      }
> +      case BRW_OPCODE_SEL: {
> +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
> +         fs_reg strided_temp0 = subscript(temp0, inst->dst.type, 0);
> +         fs_reg temp1 = ibld.vgrf(inst->src[1].type, 1);
> +         fs_reg strided_temp1 = subscript(temp1, inst->dst.type, 0);
>  
> -      inst->remove(block);
> +         /* Let's convert the operands to the destination type first */
> +         ibld.MOV(strided_temp0, inst->src[0]);
> +         ibld.MOV(strided_temp1, inst->src[1]);
> +         inst->src[0] = strided_temp0;
> +         inst->src[1] = strided_temp1;

Why don't you change the destination type to match the execution type so
you only need to emit one additional converting move instead of two when
there is a type mismatch?

> +         break;
> +      }
> +      default:
> +         /* FIXME: If this is not a supported instruction, then we need to support it. */
> +         unreachable("Not supported instruction");
> +      }
>        progress = true;
>     }
>  
> -- 
> 2.11.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170303/b20c6e84/attachment-0001.sig>


More information about the mesa-dev mailing list