[Mesa-dev] [PATCH 8/9] nir: Add code to fixup bitfield_insert/extract.

Tue Jan 12 09:26:46 PST 2016

On 01/11/2016 10:04 PM, Connor Abbott wrote:
> On Tue, Jan 12, 2016 at 12:54 AM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Jan 11, 2016 at 6:58 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>> On Jan 11, 2016 2:47 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
>>>>
>>>> The OpenGL specifications for bitfieldInsert() and bitfieldExtract() says:
>>>>
>>>>    The result will be undefined if <offset> or <bits> is negative, or if
>>>>    the sum of <offset> and <bits> is greater than the number of bits
>>>>    used to store the operand.
>>>>
>>>> Therefore passing bits=32, offset=0 is legal and defined in GLSL.
>>>>
>>>> But the earlier DX11/SM5 bfi/ibfe/ubfe opcodes are specified to accept a
>>>> bitfield width ranging from 0-31. As such, Intel and AMD instructions
>>>> read only the low 5 bits of the width operand, making them not able to
>>>> implement the GLSL-specified behavior directly.
>>>>
>>>> This commit adds a pass that inserts code to implement the trivial cases
>>>> of <bits> = 32 as
>>>>
>>>>    bitfieldInsert:
>>>>       bits > 31 ? insert : bitfieldInsert(base, insert, offset, bits)
>>>>
>>>>    bitfieldExtract:
>>>>       bits > 31 ? value : bitfieldExtract(value, offset, bits)
>>>
>>> Why not just add this to nir_opt_algebraic with a flag in
>>> nir_compiler_options?  That's the way we've done a bunch of other lowering.
>>
>> Note that these are replacing the expression with an expression
>> containing the original expression.
>>
>> I think there's a distinction to be made between lowering (decomposing
>> an operation into a number of simpler operations) and what I've called
>> a "fixup" (inserting some additional code needed for correctness or
>> other reasons).
>>
>> I claim that doing this fixup once before optimizations is the best
>> solution, with the alternatives suffering from problems:
>>
>>  - do it in nir_opt_algebraic. Problem: how to avoid adding the fixup
>> code multiple times? The comparison could be optimized away...
>>  - do it after optimizations. Problem: might have been able to
>> optimize the comparison away...
> 
> The problem with doing it this way is that now bitfield_insert and
> bitfield_extract now have two different semantics, which seems a
> little ugly. For example, it's impossible to know in an optimization
> pass whether it's legal to do the inverse transform: bits > 31 ? value
> : bitfield_extract(value, offset, bits) -> bitfield_extract, which for
> HW implementing the GL semantics, might be a useful thing. I think the
> cleanest way to do it would be:

When I first reviewed this series, I had a similar issue.  It seemed a
little confusing to have uncertainty about how a particular operation
would work.  At the very least that makes constant evaluation a little
sketchy.

It also seemed a bit icky, for lack of a better word, to have a fixup
pass that replaces an opcode with an expression involving itself.  I had
to look at the code pretty carefully to convince myself that it couldn't
get expanded multiple times.

I think I at least partially agree with Connor.  If there are two
slightly different semantics available, we should have two different
opcodes.  It's annoying, but that approach is less likely to spawn
additional bugs later.

It seems like we can get away with having a single semantic that matches
the DX semantic.  We know that AMD and Intel follow the DX semantic.  Do
we know about the other hardware supported by Mesa?

It looks like VC4 has BFE.  What is it's semantic?

Does Ardreno even have BFI or BFE instructions?

What is NVIDIA's semantic?  Looking at NV_gpu_program5, I think it may
have the GLSL semantic, so we may be out of luck.  The pseudocode there
matches the implementation of NIR's bfm without patch 7.

      Type BitfieldExtract(Type bits, Type offset, Type value)
      {
        if (bits < 0 || offset < 0 || offset >= TotalBits(Type) ||
            bits + offset > TotalBits(Type)) {
          /* result undefined */
        } else if (bits == 0) {
          return 0;
        } else {
          return (value << (TotalBits(Type) - (bits+offset))) >>
                   (TotalBits(type) - bits);
        }
      }

This particular issue reminds me of some conversations we had about
intrinsic operations and GLSL built-ins several years ago.  At the time
we floated the idea of having a mechanism by which a driver could
override the implementation of various intrinsics and built-in
functions.  It was long enough ago that I think I remember discussing it
with Paul.

The idea that I think a lot of people liked was to have pseudo-ops for
most or all built-in functions that exactly matched the semantic of the
built-in function.  In the IR, these would look like function calls.
During linking these pseudo-ops would get expanded to some set of
primitive operations.  There would be a default expansion for each, and
drivers would be able to overload the individual expansions.

Now that SPRI-V exists, this may be more appealing.  We could define our
built-in functions to generate the same kind of pseudo-instruction as
SPRI-V.  Then at translation from GLSL IR or SPIR-V to NIR, we could use
a possibly device-dependent conversion to primitive NIR instructions.

> - introduce a new bitfield_extract opcode so that we have both
> bitfield_extract and bfe
> - clarify that bfe, bfi, and bfm have the D3D semantics, while
> bitfield_extract and bitfield_insert are equivalent to the GLSL
> builtins
> - add a flag to lower the GLSL opcodes into the D3D opcodes, inserting
> the fixup code along the way
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev