[Mesa-dev] ARB_gs5 new instruction support in gallium

Ilia Mirkin imirkin at alum.mit.edu
Mon Apr 21 12:10:36 PDT 2014


On Mon, Apr 21, 2014 at 2:52 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 21.04.2014 17:54, schrieb Ilia Mirkin:
>> Hello,
>>
>> I've been giving some thought to catching up with core mesa on ARB_gs5
>> support. One of the things that ARB_gs5 introduces are new operations:
>>
>>       genType frexp(genType x, out genIType exp);
>>       genType ldexp(genType x, in genIType exp);
>>
>>       genIType bitfieldExtract(genIType value, int offset, int bits);
>>       genUType bitfieldExtract(genUType value, int offset, int bits);
>>
>>       genIType bitfieldInsert(genIType base, genIType insert, int offset,
>>                               int bits);
>>       genUType bitfieldInsert(genUType base, genUType insert, int offset,
>>                               int bits);
>>
>>       genIType bitfieldReverse(genIType value);
>>       genUType bitfieldReverse(genUType value);
>>
>>       genIType bitCount(genIType value);
>>       genIType bitCount(genUType value);
>>
>>       genIType findLSB(genIType value);
>>       genIType findLSB(genUType value);
>>
>>       genIType findMSB(genIType value);
>>       genIType findMSB(genUType value);
>>
>>       genUType uaddCarry(genUType x, genUType y, out genUType carry);
>>       genUType usubBorrow(genUType x, genUType y, out genUType borrow);
>>
>>       void umulExtended(genUType x, genUType y, out genUType msb,
>>                         out genUType lsb);
>>       void imulExtended(genIType x, genIType y, out genIType msb,
>>                         out genIType lsb);
>>
>> (I've skipped the packing stuff since that seems to already be
>> supported/lowered elsewhere, i2f/f2i which is already handled, and the
>> texture gather stuff, for which support already exists. And the
>> interpolateAt* stuff which isn't supported by core mesa yet, and when
>> it is, will require a very diff kind of handling than the above.)
>>
>> I guess the only drivers one really needs to worry about here are
>> r600/radeonsi and nouveau. svga is largely a passthrough afaik, and
>> llvmpipe/softpipe is software and can thus implement it however it
>> wants.
>>
>> Looking at the nvc0+ shader ISA, there are instructions to directly
>> handle all the bitfield stuff (bitfieldExtract, bitfieldInsert,
>> bitfieldReverse, bitCount, findLSB, findMSB). There is also a "mul
>> high", which is that the *mulExtended stuff gets translated into.
>>
>> There are no instructions to handle frexp/ldexp, or the add carry/sub
>> borrow stuff. (Looking at the code the blob generates, they just do
>> all that "by hand". Even though there is a "set cc" flag on those
>> instructions which one might assume has the carry. But the blob didn't
>> use it.)
>>
>> So I was thinking that we could just take the relevant SM5
>> instructions and lower the rest. Specifically, these would be the new
>> opcodes:
>>
>> IBFE
>> UBFE
>> BFI
>> BREV (not BFREV since most instructions appear to be 3/4 letters)
>> POPC (shorter than "countbits")
>> LSB
>> UMSB
>> IMSB
>> IMULHI
> We already have imul_hi.

Yeah, I noticed that after I sent it out. Only llvmpipe (and perhaps
softpipe) supports it though, based on a quick grep. And nothing emits
it (although presumably the vmware d3d10 st makes use of it).

>
>>
>> I just took a look at the Radeon SI ISA, and it does seem like it has
>> ldexp/frexp instructions, as well as setting the carry flag for
>> addc/subb. Although since TGSI doesn't have flags or multiple
>> destinations, not sure how the latter 2 could be easily encoded in the
>> glsl->tgsi translation.
> It is not entirely true that tgsi doesn't support multiple destinations.
> The token format allows 0-3 destinations. But so far instructions with
> more than one destination do not exist. There was some discussion about
> it when we needed umul_hi/imul_hi (since these are also multiple
> destination sm4 instructions) but deemed it not worth it, partly also
> because it didn't look like (most) gpus could actually benefit from this
> being just 1 instruction instead of two (that is, it would emit the same
> 2 instructions for the low and high part of the mul anyway). Mostly
> because gpus (and cpus) usually follow the model of multiple 32bit
> sources in, one 32bit dst out. Obviously the accumulator of intel gpus
> is an exception there.
> So, you could follow that same model with subb/addc - use the existing
> sub/add and just use a new instruction for the borrow/carry part (though
> it looks like if you do it with two instructions anyway, you could just
> use an existing instruction for the carry/borrow part). But if gpus
> actually can set two regs simultaneously (or otherwise benefit from this
> being one instruction without having to "reassemble" it, for instance
> with special carry flags), then it might be better to actually use
> multi-dest instructions. Most likely because this hasn't been used at
> all until now it will break in some places, but there should not be
> anything major preventing this to work.

You're still going to have to reassemble it one way or another --
either detecting UADD/ADDC combinations, or UADD/USLT combinations.
Might as well use the more general one, no? (And a similar combo can
be used for SUBB, I think.)

Having real multiple outputs will be useful if anyone wants to pipe
FREXP all the way through -- that'll be a bit awkward to do as 2
opcodes. Since nvc0 doesn't support it, I won't be losing sleep over
it :)

>
>
>>
>> Thoughts/opinions before I go and implement the above? Is someone else
>> already working on this?
> I think this looks good overall. We're getting close to the max number
> of different instructions though (256) but if that should become a
> problem can easily ditch some (or double the max number by killing a bit
> from max number of sources - 0-15 sources is not useful, 0-7 would still
> be more than enough).

I didn't realize there was a max instruction quantity, but these will
have to be added one way or another if gallium is to support GL4.0 :)
There's also the Double ISA which appears to be documented but not
actually in p_shader_tokens.h, which will take up a whole bunch of
opcodes as well.

In any case, I'm going to take a stab at implementing these and piping
them through to nvc0 after I finish up ARB_sample_shading (coming soon
to a patch near you).

  -ilia


More information about the mesa-dev mailing list