[Mesa-dev] ARB_gs5 new instruction support in gallium

Mon Apr 21 11:52:07 PDT 2014

Am 21.04.2014 17:54, schrieb Ilia Mirkin:
> Hello,
> 
> I've been giving some thought to catching up with core mesa on ARB_gs5
> support. One of the things that ARB_gs5 introduces are new operations:
> 
>       genType frexp(genType x, out genIType exp);
>       genType ldexp(genType x, in genIType exp);
> 
>       genIType bitfieldExtract(genIType value, int offset, int bits);
>       genUType bitfieldExtract(genUType value, int offset, int bits);
> 
>       genIType bitfieldInsert(genIType base, genIType insert, int offset,
>                               int bits);
>       genUType bitfieldInsert(genUType base, genUType insert, int offset,
>                               int bits);
> 
>       genIType bitfieldReverse(genIType value);
>       genUType bitfieldReverse(genUType value);
> 
>       genIType bitCount(genIType value);
>       genIType bitCount(genUType value);
> 
>       genIType findLSB(genIType value);
>       genIType findLSB(genUType value);
> 
>       genIType findMSB(genIType value);
>       genIType findMSB(genUType value);
> 
>       genUType uaddCarry(genUType x, genUType y, out genUType carry);
>       genUType usubBorrow(genUType x, genUType y, out genUType borrow);
> 
>       void umulExtended(genUType x, genUType y, out genUType msb,
>                         out genUType lsb);
>       void imulExtended(genIType x, genIType y, out genIType msb,
>                         out genIType lsb);
> 
> (I've skipped the packing stuff since that seems to already be
> supported/lowered elsewhere, i2f/f2i which is already handled, and the
> texture gather stuff, for which support already exists. And the
> interpolateAt* stuff which isn't supported by core mesa yet, and when
> it is, will require a very diff kind of handling than the above.)
> 
> I guess the only drivers one really needs to worry about here are
> r600/radeonsi and nouveau. svga is largely a passthrough afaik, and
> llvmpipe/softpipe is software and can thus implement it however it
> wants.
> 
> Looking at the nvc0+ shader ISA, there are instructions to directly
> handle all the bitfield stuff (bitfieldExtract, bitfieldInsert,
> bitfieldReverse, bitCount, findLSB, findMSB). There is also a "mul
> high", which is that the *mulExtended stuff gets translated into.
> 
> There are no instructions to handle frexp/ldexp, or the add carry/sub
> borrow stuff. (Looking at the code the blob generates, they just do
> all that "by hand". Even though there is a "set cc" flag on those
> instructions which one might assume has the carry. But the blob didn't
> use it.)
> 
> So I was thinking that we could just take the relevant SM5
> instructions and lower the rest. Specifically, these would be the new
> opcodes:
> 
> IBFE
> UBFE
> BFI
> BREV (not BFREV since most instructions appear to be 3/4 letters)
> POPC (shorter than "countbits")
> LSB
> UMSB
> IMSB
> IMULHI
We already have imul_hi.

> 
> I just took a look at the Radeon SI ISA, and it does seem like it has
> ldexp/frexp instructions, as well as setting the carry flag for
> addc/subb. Although since TGSI doesn't have flags or multiple
> destinations, not sure how the latter 2 could be easily encoded in the
> glsl->tgsi translation.
It is not entirely true that tgsi doesn't support multiple destinations.
The token format allows 0-3 destinations. But so far instructions with
more than one destination do not exist. There was some discussion about
it when we needed umul_hi/imul_hi (since these are also multiple
destination sm4 instructions) but deemed it not worth it, partly also
because it didn't look like (most) gpus could actually benefit from this
being just 1 instruction instead of two (that is, it would emit the same
2 instructions for the low and high part of the mul anyway). Mostly
because gpus (and cpus) usually follow the model of multiple 32bit
sources in, one 32bit dst out. Obviously the accumulator of intel gpus
is an exception there.
So, you could follow that same model with subb/addc - use the existing
sub/add and just use a new instruction for the borrow/carry part (though
it looks like if you do it with two instructions anyway, you could just
use an existing instruction for the carry/borrow part). But if gpus
actually can set two regs simultaneously (or otherwise benefit from this
being one instruction without having to "reassemble" it, for instance
with special carry flags), then it might be better to actually use
multi-dest instructions. Most likely because this hasn't been used at
all until now it will break in some places, but there should not be
anything major preventing this to work.


> 
> Thoughts/opinions before I go and implement the above? Is someone else
> already working on this?
I think this looks good overall. We're getting close to the max number
of different instructions though (256) but if that should become a
problem can easily ditch some (or double the max number by killing a bit
from max number of sources - 0-15 sources is not useful, 0-7 would still
be more than enough).

Roland