[Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

Fri Jul 6 06:25:25 UTC 2018

On Thu, 2018-07-05 at 15:47 -0700, Caio Marcelo de Oliveira Filho
wrote:
> (I had to stop reading to go home last Tuesday, so here are the
> remaining comments.)
> 
> 
> On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote:
> > NIR assumes that all booleans are 32-bit but Intel hardware
> > produces
> > booleans of the same size as the operands to the CMP instruction,
> > so we
> > can actually have 8-bit and 16-bit booleans. To work around this
> > mismatch between NIR and the hardware, we emit boolean conversions
> > to
> > 32-bit right after emitting the CMP instruction during the NIR->FS
> > pass, which makes interfacing with NIR a lot easier, but can leave
> > unnecessary boolean conversions in the shader code.
> 
> Question: have you explored handling this at the NIR->FS conversion?
> I.e. instead of generate the cmp + mov and then look for the cmp +
> mov
> to fix up; when generating a cmp, perform those checks (at nir level)
> and then pick the right bitsize.

It is not that easy. The problem is that NIR will continue to come at
us with 32-bit boolean instructions after the CMP+MOV, so instead of
prpagating forward for every conversion, now, for every bool we find in
IR we'd need to go back in the FS program to check if it is a real 32-
bit boolean or not to decide what to emit. I don't see any benefit,
plus we would be coupling all this with the translation implementation,
which I think is less nice than having it being a completely separate
thing.

Anyway, there is a major issue with the current patch that I have found
this week thanks to some new CTS tests: when we propagate the bitsize
of a logical instruction to its destination, that affects all its
consumers even outside the current block, so we need to handle
propagation across blocks, which adds a few more problems, so I still
need to figure out how to deal with that properly and whether that is
something we want to do (there is a reason why no other opt/lowering
passes do cross-block changes...).

Iago

> 
> > +/**
> > + * Propagates the bit-size of the destination of a boolean
> > instruction to
> > + * all its consumers. If propagate_from_source is True, then the
> > producer
> > + * is a conversion MOV from a low bit-size boolean to 32-bit, and
> > in that
> > + * case the propagation happens from the source of the instruction
> > instead
> > + * of its destination.
> > + */
> > +static bool
> > +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
> > +{
> > +   assert(!propagate_from_source || inst->opcode ==
> > BRW_OPCODE_MOV);
> > +
> > +   bool progress = false;
> > +
> > +   const unsigned bit_size = 8 * (propagate_from_source ?
> > +      type_sz(inst->src[0].type) : type_sz(inst->dst.type));
> > +
> > +   /* Look for any follow-up instructions that sources from the
> > boolean
> > +    * result of the producer instruction and rewrite them to use
> > the correct
> > +    * bit-size.
> > +    */
> > +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst)
> > {
> > +      if (!inst_supports_boolean(fixup_inst))
> > +         continue;
> 
> Should we care about other instruction clobbering the contents of
> inst->dst, or at this point of the optimization we can count on it
> not
> being?
> 
> 
> > +         /* If it is a plain boolean conversion to 32-bit, then
> > look for any
> > +          * follow-up instructions that source from the 32-bit
> > boolean and
> > +          * rewrite them to source from the output of the CMP
> > (which is the
> > +          * source of the conversion instruction) directly if
> > possible.
> > +          */
> > +         progress = propagate_bool_bit_size(conv_inst, true) ||
> > progress;
> > +      }
> > +#if 0
> > +       else if (inst_supports_boolean(inst) && inst->sources > 1)
> > {
> 
> If you end up enabling this section, I suggest move the
> inst_supports_boolean() check to the beginning of the for-loop, as an
> early return. Makes the condition for the cases we are handling
> cleaner.
> 
> 
> 
> > +         /* For all logical instructions that can take more than
> > one operand
> > +          * we need to ensure that all of them have matching bit-
> > sizes. If they
> > +          * don't, it means that the original shader code is
> > operating boolean
> > +          * expressions with different native bit-sizes and we
> > need to choose
> > +          * a canonical boolean form for all the operands, which
> > requires to
> > +          * inject conversions to temporaries. We choose the bit-
> > size of the
> > +          * destination as the canonical form (which must be a 32-
> > bit boolean
> > +          * since we only propagate smaller bit-sizes to the
> > destination if we
> > +          * managed to convert all the operands to the same bit-
> > size) because
> > +          * that way we don't need to worry about propagating the
> > destination
> > +          * bit-size down the line.
> > +          */
> 
> To make this comment less heavy, I'd move the assumption about the
> destination being 32-bit right above the assert, which is kind of an
> explanation of the assertion.
> 
> 
> > @@ -6240,6 +6471,7 @@ fs_visitor::optimize()
> >  
> >     OPT(opt_drop_redundant_mov_to_flags);
> >     OPT(remove_extra_rounding_modes);
> > +   OPT(opt_bool_bit_size);
> 
> It could be useful to have a short comment here about the importance
> of calling opt_bool_bit_size at this point.
> 
> 
> 
> Thanks,
> Caio
>