[Mesa-dev] [PATCH 2/2] glsl: Handle bits=32 case in bitfieldInsert/bitfieldExtract.

Mon Jan 4 10:27:45 PST 2016

On Mon, Jan 4, 2016 at 12:52 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Mon, Jan 4, 2016 at 12:44 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Wed, Dec 30, 2015 at 4:26 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> On Wed, Dec 30, 2015 at 3:26 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> The OpenGL specifications for these functions say:
>>>>
>>>>    The result will be undefined if <offset> or <bits> is negative, or if
>>>>    the sum of <offset> and <bits> is greater than the number of bits
>>>>    used to store the operand.
>>>>
>>>> Therefore passing bits=32, offset=0 is legal and defined in GLSL.
>>>>
>>>> But the earlier DX11/SM5 bfi/ibfe/ubfe opcodes are specified to accept a
>>>> bitfield width ranging from 0-31. As such, Intel and AMD instructions
>>>> read only the low 5 bits of the width operand, making them not compliant
>>>> with the GLSL spec, so we have to special case the bits=32 case.
>>>>
>>>> Checking that offset=0 is not necessary, since for any other value,
>>>> <offset> + <bits> will be greater than 32, which is specified as
>>>> generating an undefined result.
>>>>
>>>> Fixes:
>>>>    ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
>>>>    ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
>>>>    ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
>>>> ---
>>>> Yuck. Suggestions welcome.
>>>
>>> Can you make a piglit test? Want to see if nvidia has the same
>>> problem. According to
>>> http://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-bfe,
>>> offset/bits can actually be up to 255 (although I can't fully imagine
>>> why one might want that). However perhaps the HW differs.
>>
>> I just sent: [PATCH] arb_gpu_shader5: Test corner cases of
>> bitfieldInsert/bitfieldExtract.
>>
>> It's not totally tested (as in, I haven't fixed i965 to make it pass
>> because I found out that the bfi2 instruction is also broken...) but I
>> am curious to see what the proprietary NVIDIA driver does.
>
> I'm curious too. On nvc0 the new bitfieldExtract tests still pass, but
> bitfieldInsert now fails.

FWIW, that's the same behavior I see with this patch on i965. I traced
it to the bfi2 instruction returning all zeros unexpectedly for the
bits=32, offset=0 case.

I would have expected the hardware to implement that operation as

bfi dst, mask, insert, base === dst := (insert & mask) | (base & ~mask)

but the PRM says it actually calculates offset from the bitmask by "UD
offset = LZD(reverse(src0.chan[n]))-1;" where LZD is "leading zero
detect", and then uses that value as a shift argument in a later
computation. I could see LZD producing a bad result when there are no
zeros in the mask.

I'm trying to get i965 to pass the test at the moment.