[Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

Tue May 15 11:05:18 UTC 2018

NIR assumes that all booleans are 32-bit, so drivers need to produce 32-bit
booleans even if they can produce native booleans of a different bit-size, like
Intel does. This means that if we have a 16-bit CMP instruction, we generate a
16-bit boolean that we immediately convert to 32-bit, since that is the bit-size
expected by NIR for all consumers of the boolean.

This backend optimization pass identifies these cases after we are done
translating from NIR to FS IR, and propagates the lower bit-size booleans
to allow DCE to remove the 32-bit conversions. The pass should run early
after translating from NIR, since it assumes that boolean conversions to
32-bit take place immediately after the corresponding CMP instructions.

This has been tested with existing and work-in-progress CTS tests as well
as some had-hoc VkRunner I wrote.

For more context you can read this discussion:
https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.html

One point raised by Jason during the discussion linked above was that we might
need to canonicalize booleans of different native bit-sizes when they are
combined in boolean expressions. However, as indicated in the commit log for the
last patch in the series, my interpretation of the PRM is that the hardware can
handle this situation without us having to do anything about it. The last patch
contains canonicalization code under a disabled #if guard anyway, just in case
reviewers think this is needed in the end and want to have a look at what it
could look like.

Alternatively to what is being done here, we could also change the way
we construct CMP instructions to take advantage of the PRM documentation that
says that CMP instructions can mix and match *B, *W and *D for their source
and destination arguments since gen5 to always produce canonical 32-bit bools
like NIR expects. However, since all hardware gens still produce 16-bit booleans
for half-float, we would still need to handle that case specially with a similar
pass so we would not gaining much from that. Also, in that case we would always
operate with 32-bit booleans, losing the possibility to emit native 16-bit
boolean instructions where possible.

Iago Toral Quiroga (3):
  intel/compiler: make brw_reg_type_from_bit_size usable from other
    places
  intel/compiler: add a region_match() helper
  intel/compiler: add an optimization pass for booleans

 src/intel/compiler/brw_fs.cpp     | 291 ++++++++++++++++++++++++++++++++++++++
 src/intel/compiler/brw_fs.h       |   5 +
 src/intel/compiler/brw_fs_nir.cpp |  59 --------
 src/intel/compiler/brw_ir_fs.h    |  13 ++
 4 files changed, 309 insertions(+), 59 deletions(-)

-- 
2.14.1