[Mesa-dev] [PATCH 1/4] gallium: add new opcodes for ARB_gs5 bit manipulation support

Fri Apr 25 14:19:08 PDT 2014

On Fri, Apr 25, 2014 at 5:02 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 25.04.2014 19:41, schrieb Ilia Mirkin:
>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>> ---
>>  src/gallium/auxiliary/tgsi/tgsi_info.c     |  8 +++++
>>  src/gallium/docs/source/tgsi.rst           | 51 ++++++++++++++++++++++++++++++
>>  src/gallium/include/pipe/p_shader_tokens.h | 11 ++++++-
>>  3 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> index 5bcc3c9..d03a920 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>> @@ -223,6 +223,14 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] =
>>     { 1, 2, 0, 0, 0, 0, COMP, "UMUL_HI", TGSI_OPCODE_UMUL_HI },
>>     { 1, 3, 1, 0, 0, 0, OTHR, "TG4", TGSI_OPCODE_TG4 },
>>     { 1, 2, 1, 0, 0, 0, OTHR, "LODQ", TGSI_OPCODE_LODQ },
>> +   { 1, 3, 0, 0, 0, 0, COMP, "IBFE", TGSI_OPCODE_IBFE },
>> +   { 1, 3, 0, 0, 0, 0, COMP, "UBFE", TGSI_OPCODE_UBFE },
>> +   { 1, 4, 0, 0, 0, 0, COMP, "BFI", TGSI_OPCODE_BFI },
>> +   { 1, 1, 0, 0, 0, 0, COMP, "BREV", TGSI_OPCODE_BREV },
>> +   { 1, 1, 0, 0, 0, 0, COMP, "POPC", TGSI_OPCODE_POPC },
>> +   { 1, 1, 0, 0, 0, 0, COMP, "LSB", TGSI_OPCODE_LSB },
>> +   { 1, 1, 0, 0, 0, 0, COMP, "IMSB", TGSI_OPCODE_IMSB },
>> +   { 1, 1, 0, 0, 0, 0, COMP, "UMSB", TGSI_OPCODE_UMSB },
>>  };
>>
>>  const struct tgsi_opcode_info *
>> diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
>> index 0ea0759..95b069f 100644
>> --- a/src/gallium/docs/source/tgsi.rst
>> +++ b/src/gallium/docs/source/tgsi.rst
>> @@ -1558,6 +1558,57 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
>>
>>    dst.w = |src.w|
>>
>> +Bitwise ISA
>> +^^^^^^^^^^^
>> +These opcodes are used for bit-level manipulation of integers.
>> +
>> +.. opcode:: IBFE - Signed Bitfield Extract
>> +
>> +.. math::
>> +
>> +  value = src0
>> +
>> +  offset = src1
>> +
>> +  bits = src2
>> +
>> +  dst = bitfield\_extract(value, offset, bits)
>> +
>> +.. opcode:: UBFE - Unsigned Bitfield Extract
>> +
>> +.. math::
>> +
>> +  value = src0
>> +
>> +  offset = src1
>> +
>> +  bits = src2
>> +
>> +  dst = bitfield\_extract(value, offset, bits)
> I think the description for these two leaves a bit to be desired (you'd
> even think they are the same).

They basically are the same, except for the sign extension. What's the
standard for such operations which don't map into "math" nicely?
Should I stick some pseudo-code in?

>
>> +
>> +.. opcode:: BFI - Bitfield Insert
>> +
>> +.. math::
>> +
>> +  base = src0
>> +
>> +  insert = src1
>> +
>> +  offset = src2
>> +
>> +  bits = src3
>> +
>> +  dst = bitfield\_insert(base, insert, offset, bits)
> Same as above.
>
>> +
>> +.. opcode:: BREV - Bitfield Reverse
> Could also be a bit more descriptive.
>
>> +
>> +.. opcode:: POPC - Population Count (Count Set Bits)
>> +
>> +.. opcode:: LSB - Index of lowest set bit
>> +
>> +.. opcode:: IMSB - Index of highest non-sign bit
> That looks very confusing to me, since it apparently is meant to give
> the highest set bit if the number is positive, and the highest cleared
> bit if the number is negative.

Right, so if the sign-bit is 1 (negative), it's the index of the
highest 0. If the sign bit is 0 (positive), it's the index of the
highest 1. And -1 if all the bits are the same. None of these at all
map nicely to a "math" style of description. Perhaps I should just put
in a paragraph for these?

>
>> +
>> +.. opcode:: UMSB - Index of highest 1-bit
> highest set bit?

Sure.

>
> Otherwise these look reasonable to me.
> As for the addc/subb I guess this is an area where just about everything
> you do won't really match hw in any case. A quick glance at radeonsi
> tells me that gcn actually _always_ sets the carry bit for normal int
> adds/subs but does so in the VCC reg - so if you'd want to get this to a
> "normal" register you'd have to do some other instruction (maybe
> conditional 0/1 move based on VCC). However, gcn actually has subb/addc
> instructions, these just do add/sub honoring that VCC bit (and again
> still outputting VCC bit themselves).
> But sm5 and glsl agree there - they both have addc/subb with just just 2
> inputs (so no carry/borrow input) but an additional "normal" overflow
> output. Maybe this is easiest to transform into what hw will actually do
> usually.

I was hoping to not have to deal with carry/borrow at the TGSI level
at all and just have the GLSL lower to ADD + USLT or so, and then for
hw capable of dealing with it (not nvc0, or at least the blob driver
doesn't make use of a mechanism that'd enable it), having a peephole
opt that converts the USLT to a "recover whereever the flag is at".

>
> Roland
>
>
>>
>>  Geometry ISA
>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h
>> index b537166..d095bd3 100644
>> --- a/src/gallium/include/pipe/p_shader_tokens.h
>> +++ b/src/gallium/include/pipe/p_shader_tokens.h
>> @@ -462,7 +462,16 @@ struct tgsi_property_data {
>>
>>  #define TGSI_OPCODE_LODQ                183
>>
>> -#define TGSI_OPCODE_LAST                184
>> +#define TGSI_OPCODE_IBFE                184
>> +#define TGSI_OPCODE_UBFE                185
>> +#define TGSI_OPCODE_BFI                 186
>> +#define TGSI_OPCODE_BREV                187
>> +#define TGSI_OPCODE_POPC                188
>> +#define TGSI_OPCODE_LSB                 189
>> +#define TGSI_OPCODE_IMSB                190
>> +#define TGSI_OPCODE_UMSB                191
>> +
>> +#define TGSI_OPCODE_LAST                192
>>
>>  #define TGSI_SAT_NONE            0  /* do not saturate */
>>  #define TGSI_SAT_ZERO_ONE        1  /* clamp to [0,1] */
>>