[Mesa-dev] [PATCH 8/9] nir: Add code to fixup bitfield_insert/extract.

Mon Jan 11 22:04:37 PST 2016

On Tue, Jan 12, 2016 at 12:54 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Jan 11, 2016 at 6:58 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>
>> On Jan 11, 2016 2:47 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
>>>
>>> The OpenGL specifications for bitfieldInsert() and bitfieldExtract() says:
>>>
>>>    The result will be undefined if <offset> or <bits> is negative, or if
>>>    the sum of <offset> and <bits> is greater than the number of bits
>>>    used to store the operand.
>>>
>>> Therefore passing bits=32, offset=0 is legal and defined in GLSL.
>>>
>>> But the earlier DX11/SM5 bfi/ibfe/ubfe opcodes are specified to accept a
>>> bitfield width ranging from 0-31. As such, Intel and AMD instructions
>>> read only the low 5 bits of the width operand, making them not able to
>>> implement the GLSL-specified behavior directly.
>>>
>>> This commit adds a pass that inserts code to implement the trivial cases
>>> of <bits> = 32 as
>>>
>>>    bitfieldInsert:
>>>       bits > 31 ? insert : bitfieldInsert(base, insert, offset, bits)
>>>
>>>    bitfieldExtract:
>>>       bits > 31 ? value : bitfieldExtract(value, offset, bits)
>>
>> Why not just add this to nir_opt_algebraic with a flag in
>> nir_compiler_options?  That's the way we've done a bunch of other lowering.
>
> Note that these are replacing the expression with an expression
> containing the original expression.
>
> I think there's a distinction to be made between lowering (decomposing
> an operation into a number of simpler operations) and what I've called
> a "fixup" (inserting some additional code needed for correctness or
> other reasons).
>
> I claim that doing this fixup once before optimizations is the best
> solution, with the alternatives suffering from problems:
>
>  - do it in nir_opt_algebraic. Problem: how to avoid adding the fixup
> code multiple times? The comparison could be optimized away...
>  - do it after optimizations. Problem: might have been able to
> optimize the comparison away...

The problem with doing it this way is that now bitfield_insert and
bitfield_extract now have two different semantics, which seems a
little ugly. For example, it's impossible to know in an optimization
pass whether it's legal to do the inverse transform: bits > 31 ? value
: bitfield_extract(value, offset, bits) -> bitfield_extract, which for
HW implementing the GL semantics, might be a useful thing. I think the
cleanest way to do it would be:

- introduce a new bitfield_extract opcode so that we have both
bitfield_extract and bfe
- clarify that bfe, bfi, and bfm have the D3D semantics, while
bitfield_extract and bitfield_insert are equivalent to the GLSL
builtins
- add a flag to lower the GLSL opcodes into the D3D opcodes, inserting
the fixup code along the way