[Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

Thu Jul 5 22:47:43 UTC 2018

(I had to stop reading to go home last Tuesday, so here are the
remaining comments.)

On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote:
> NIR assumes that all booleans are 32-bit but Intel hardware produces
> booleans of the same size as the operands to the CMP instruction, so we
> can actually have 8-bit and 16-bit booleans. To work around this
> mismatch between NIR and the hardware, we emit boolean conversions to
> 32-bit right after emitting the CMP instruction during the NIR->FS
> pass, which makes interfacing with NIR a lot easier, but can leave
> unnecessary boolean conversions in the shader code.

Question: have you explored handling this at the NIR->FS conversion?
I.e. instead of generate the cmp + mov and then look for the cmp + mov
to fix up; when generating a cmp, perform those checks (at nir level)
and then pick the right bitsize.

> +/**
> + * Propagates the bit-size of the destination of a boolean instruction to
> + * all its consumers. If propagate_from_source is True, then the producer
> + * is a conversion MOV from a low bit-size boolean to 32-bit, and in that
> + * case the propagation happens from the source of the instruction instead
> + * of its destination.
> + */
> +static bool
> +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
> +{
> +   assert(!propagate_from_source || inst->opcode == BRW_OPCODE_MOV);
> +
> +   bool progress = false;
> +
> +   const unsigned bit_size = 8 * (propagate_from_source ?
> +      type_sz(inst->src[0].type) : type_sz(inst->dst.type));
> +
> +   /* Look for any follow-up instructions that sources from the boolean
> +    * result of the producer instruction and rewrite them to use the correct
> +    * bit-size.
> +    */
> +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) {
> +      if (!inst_supports_boolean(fixup_inst))
> +         continue;

Should we care about other instruction clobbering the contents of
inst->dst, or at this point of the optimization we can count on it not
being?

> +         /* If it is a plain boolean conversion to 32-bit, then look for any
> +          * follow-up instructions that source from the 32-bit boolean and
> +          * rewrite them to source from the output of the CMP (which is the
> +          * source of the conversion instruction) directly if possible.
> +          */
> +         progress = propagate_bool_bit_size(conv_inst, true) || progress;
> +      }
> +#if 0
> +       else if (inst_supports_boolean(inst) && inst->sources > 1) {

If you end up enabling this section, I suggest move the
inst_supports_boolean() check to the beginning of the for-loop, as an
early return. Makes the condition for the cases we are handling
cleaner.

> +         /* For all logical instructions that can take more than one operand
> +          * we need to ensure that all of them have matching bit-sizes. If they
> +          * don't, it means that the original shader code is operating boolean
> +          * expressions with different native bit-sizes and we need to choose
> +          * a canonical boolean form for all the operands, which requires to
> +          * inject conversions to temporaries. We choose the bit-size of the
> +          * destination as the canonical form (which must be a 32-bit boolean
> +          * since we only propagate smaller bit-sizes to the destination if we
> +          * managed to convert all the operands to the same bit-size) because
> +          * that way we don't need to worry about propagating the destination
> +          * bit-size down the line.
> +          */

To make this comment less heavy, I'd move the assumption about the
destination being 32-bit right above the assert, which is kind of an
explanation of the assertion.

> @@ -6240,6 +6471,7 @@ fs_visitor::optimize()
>  
>     OPT(opt_drop_redundant_mov_to_flags);
>     OPT(remove_extra_rounding_modes);
> +   OPT(opt_bool_bit_size);

It could be useful to have a short comment here about the importance
of calling opt_bool_bit_size at this point.

Thanks,
Caio