[Mesa-dev] ARB_gs5 new instruction support in gallium

Mon Apr 21 10:35:48 PDT 2014

On Mon, Apr 21, 2014 at 10:20 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Mon, Apr 21, 2014 at 12:56 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Mon, Apr 21, 2014 at 8:54 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>> Hello,
>>>
>>> I've been giving some thought to catching up with core mesa on ARB_gs5
>>> support. One of the things that ARB_gs5 introduces are new operations:
>>>
>>>       genType frexp(genType x, out genIType exp);
>>>       genType ldexp(genType x, in genIType exp);
>>>
>>>       genIType bitfieldExtract(genIType value, int offset, int bits);
>>>       genUType bitfieldExtract(genUType value, int offset, int bits);
>>>
>>>       genIType bitfieldInsert(genIType base, genIType insert, int offset,
>>>                               int bits);
>>>       genUType bitfieldInsert(genUType base, genUType insert, int offset,
>>>                               int bits);
>>>
>>>       genIType bitfieldReverse(genIType value);
>>>       genUType bitfieldReverse(genUType value);
>>>
>>>       genIType bitCount(genIType value);
>>>       genIType bitCount(genUType value);
>>>
>>>       genIType findLSB(genIType value);
>>>       genIType findLSB(genUType value);
>>>
>>>       genIType findMSB(genIType value);
>>>       genIType findMSB(genUType value);
>>>
>>>       genUType uaddCarry(genUType x, genUType y, out genUType carry);
>>>       genUType usubBorrow(genUType x, genUType y, out genUType borrow);
>>>
>>>       void umulExtended(genUType x, genUType y, out genUType msb,
>>>                         out genUType lsb);
>>>       void imulExtended(genIType x, genIType y, out genIType msb,
>>>                         out genIType lsb);
>>>
>>> (I've skipped the packing stuff since that seems to already be
>>> supported/lowered elsewhere, i2f/f2i which is already handled, and the
>>> texture gather stuff, for which support already exists. And the
>>> interpolateAt* stuff which isn't supported by core mesa yet, and when
>>> it is, will require a very diff kind of handling than the above.)
>>>
>>> I guess the only drivers one really needs to worry about here are
>>> r600/radeonsi and nouveau. svga is largely a passthrough afaik, and
>>> llvmpipe/softpipe is software and can thus implement it however it
>>> wants.
>>>
>>> Looking at the nvc0+ shader ISA, there are instructions to directly
>>> handle all the bitfield stuff (bitfieldExtract, bitfieldInsert,
>>> bitfieldReverse, bitCount, findLSB, findMSB). There is also a "mul
>>> high", which is that the *mulExtended stuff gets translated into.
>>>
>>> There are no instructions to handle frexp/ldexp, or the add carry/sub
>>> borrow stuff. (Looking at the code the blob generates, they just do
>>> all that "by hand". Even though there is a "set cc" flag on those
>>> instructions which one might assume has the carry. But the blob didn't
>>> use it.)
>>>
>>> So I was thinking that we could just take the relevant SM5
>>> instructions and lower the rest. Specifically, these would be the new
>>> opcodes:
>>>
>>> IBFE
>>> UBFE
>>> BFI
>>> BREV (not BFREV since most instructions appear to be 3/4 letters)
>>> POPC (shorter than "countbits")
>>> LSB
>>> UMSB
>>> IMSB
>>> IMULHI
>>>
>>> I just took a look at the Radeon SI ISA, and it does seem like it has
>>> ldexp/frexp instructions, as well as setting the carry flag for
>>> addc/subb. Although since TGSI doesn't have flags or multiple
>>> destinations, not sure how the latter 2 could be easily encoded in the
>>> glsl->tgsi translation.
>>>
>>> Thoughts/opinions before I go and implement the above? Is someone else
>>> already working on this?
>>
>> I've written lowering code for ldexp/frexp. It relies on support for
>
> The lowering code for ldexp is optional, but the frexp one seems to be
> "required" (in that there is no ir_binop_frexp at all). If RadeonSI
> wants to make use of its built-in frexp instruction, they'll either
> need to change it, or have a _really_ clever peephole pass. (Didn't
> check if the r600 isa had the same thing...)

R700 is the first to have frexp/ldexp instructions.

Someone will need to convert the frexp code in builtin_functions.cpp
to a lowering pass if they want to use an frexp instruction.

>> EXT_shader_integer_mix, which disappointingly no other Mesa drivers
>> have exposed.
>
> http://gallium.readthedocs.org/en/latest/tgsi.html#opcode-UCMP
>
> I assume that's the same thing? If so, that extension can probably
> just be exposed as-is on gallium for drivers that support
> NativeIntegers.

Yeah, looks like that instruction is all that's needed.

>> For the multi-destination built-ins, i965 has multi-destination
>> instructions (addc, subb) which write the carry/borrow to the
>> accumulator register. Instead of doing a ton of infrastructure to
>> support multi-destination IR I emit an add an addc for uaddCarry and
>> only use the carry result from addc. A peephole optimization can
>> easily combine the add/addc into a single addc.
>
> Hm, neat idea. But the same peephole pass could, instead, be used to detect
>
> UADD x, a, b
> USLT y, x, a
>
> And then you don't need the special ADDC instruction. And you get the
> advantage of being able to detect (some) people who were doing this by
> hand before.

I suppose so, but we can't implement USLT on i965 more efficiently than addc.