[Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

Tue Jun 5 08:51:34 UTC 2018

This isn't reviewed yet, any feedback?

Iago

On Tue, 2018-05-15 at 13:05 +0200, Iago Toral Quiroga wrote:
> NIR assumes that all booleans are 32-bit, so drivers need to produce
> 32-bit
> booleans even if they can produce native booleans of a different bit-
> size, like
> Intel does. This means that if we have a 16-bit CMP instruction, we
> generate a
> 16-bit boolean that we immediately convert to 32-bit, since that is
> the bit-size
> expected by NIR for all consumers of the boolean.
> 
> This backend optimization pass identifies these cases after we are
> done
> translating from NIR to FS IR, and propagates the lower bit-size
> booleans
> to allow DCE to remove the 32-bit conversions. The pass should run
> early
> after translating from NIR, since it assumes that boolean conversions
> to
> 32-bit take place immediately after the corresponding CMP
> instructions.
> 
> This has been tested with existing and work-in-progress CTS tests as
> well
> as some had-hoc VkRunner I wrote.
> 
> For more context you can read this discussion:
> https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.htm
> l
> 
> One point raised by Jason during the discussion linked above was that
> we might
> need to canonicalize booleans of different native bit-sizes when they
> are
> combined in boolean expressions. However, as indicated in the commit
> log for the
> last patch in the series, my interpretation of the PRM is that the
> hardware can
> handle this situation without us having to do anything about it. The
> last patch
> contains canonicalization code under a disabled #if guard anyway,
> just in case
> reviewers think this is needed in the end and want to have a look at
> what it
> could look like.
> 
> Alternatively to what is being done here, we could also change the
> way
> we construct CMP instructions to take advantage of the PRM
> documentation that
> says that CMP instructions can mix and match *B, *W and *D for their
> source
> and destination arguments since gen5 to always produce canonical 32-
> bit bools
> like NIR expects. However, since all hardware gens still produce 16-
> bit booleans
> for half-float, we would still need to handle that case specially
> with a similar
> pass so we would not gaining much from that. Also, in that case we
> would always
> operate with 32-bit booleans, losing the possibility to emit native
> 16-bit
> boolean instructions where possible.
> 
> Iago Toral Quiroga (3):
>   intel/compiler: make brw_reg_type_from_bit_size usable from other
>     places
>   intel/compiler: add a region_match() helper
>   intel/compiler: add an optimization pass for booleans
> 
>  src/intel/compiler/brw_fs.cpp     | 291
> ++++++++++++++++++++++++++++++++++++++
>  src/intel/compiler/brw_fs.h       |   5 +
>  src/intel/compiler/brw_fs_nir.cpp |  59 --------
>  src/intel/compiler/brw_ir_fs.h    |  13 ++
>  4 files changed, 309 insertions(+), 59 deletions(-)
>