[Mesa-dev] 16-bit comparisons in NIR

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Sat Apr 21 00:32:04 UTC 2018


On Fri, Apr 20, 2018 at 5:16 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>>
>> On 20.04.2018 10:21, Iago Toral wrote:
>>>
>>> Hi,
>>>
>>> while developing support for Vulkan shaderInt16 on Anvil I came across
>>> a feature of NIR that was a bit inconvenient: bools are always 32-bit
>>> by design, but the Intel hardware produces 16-bit bool results for 16-
>>> bit comparisons, so that creates a problem that manifests like this:
>>>
>>> vec1 32 ssa_21 = fge ssa_20, ssa_16
>>> vec1 16 ssa_22 = b2f ssa_21
>
>
> I was thinking about this a bit this morning and it gets even more sticky.
> What happens if you have
>
> bool e = (a < b) && (c < d);
>
> where a and b are 16-bit and c and d are 32-bit?  In this case, one
> comprison has a 32-bit value and one has a 16-bit value and you have to pick
> one for the &&.
>
>>>
>>> Our CMP instruction will produce a 16-bit boolean result for the first
>>> NIR instruction (where NIR expects it to be 32-bit), so by the time we
>>> emit the second instruction in the driver the bit-size for the operand
>>> of b2f provided by NIR no longer matches the reality and we emit
>>> incorrect code.
>>>
>>> This seems to have been a consicious design choice in NIR, and while
>>> discussing this with Jason he was unsure how much we wanted to change
>>> this  or how to do it, given how thoroughly 32-bit bools are baked into
>>> NIR and the complexities that modifying this would also bring to our
>>> bit-size validation code.
>>>
>>> I have been considering alternatives that didn't involve changing NIR
>>> to support multiple bit-sizes for booleans:
>>>
>>> 1) Drivers that need to emit smaller booleans could try to fix the
>>> generated NIR by correcting the expected bit-sizes for CMP
>>> instructions. This would be rather trivial to implement in drivers (and
>>> maybe we could even make a generic pass for other drivers to use if
>>> they need it) but this will make the validator complain because it
>>> won't recognize comparisons with 16-bit bool outputs as valid NIR
>>> opcodes. I also found instances where nir_search would complain about
>>> mismatching bit-sizes. I haven't looked any further into it yet though,
>>> so maybe we can reasonably work around these issues.
>>>
>>> 2) Drivers could handle this specially when they emit code from NIR.
>>> Specifically, when they see a 32-bit boolean source in an instruction,
>>> they would have to search for the instruction that produced that source
>>> value and check whether it is a 16-bit or a 32-bit boolean to emit
>>> proper code for the instruction.
>>>
>>> 3) Drivers can just convert the 16-bit bool result they generate for
>>> 16-bit cmp to the 32-bit bool that NIR expects, and then possibly run
>>> an optimization pass to eliminate these extra conversions and fix up
>>> the code accordingly.
>>
>>
>> radeonsi(NIR) and radv already use option 3, since GCN hardware really
>> wants to treat bools as 1-bit value, so that's what I'd suggest. The
>> optimizations that cleanup the conversions happen in LLVM for us.
>
>
> Is this a GCN thing or an LLVM thing?  It would be neat if your hardware had
> 1-bit registers. :-)  We sort-of do but they're special flag registers and
> we have very few of them.

LLVM. For GCN  HW we use a 64-bit register that is shared between
lanes (i.e. having 1 bit for each lane)
>
> --Jason
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>


More information about the mesa-dev mailing list