[Mesa-dev] TGSI 16-bit support

Nicolai Hähnle nhaehnle at gmail.com
Wed Aug 23 14:30:24 UTC 2017

On 23.08.2017 16:00, Roland Scheidegger wrote:
> Am 23.08.2017 um 15:08 schrieb Nicolai Hähnle:
>> On 22.08.2017 22:39, Roland Scheidegger wrote:
>>> Am 22.08.2017 um 19:10 schrieb Marek Olšák:
>>>> Hi,
>>>> I'd like to discuss 16-bit float and integer support in TGSI. I'm
>>>> proposing this:
>>>>    struct tgsi_instruction
>>>>    {
>>>>       unsigned Type       : 4;  /* TGSI_TOKEN_TYPE_INSTRUCTION */
>>>>       unsigned NrTokens   : 8;  /* UINT */
>>>>       unsigned Opcode     : 8;  /* TGSI_OPCODE_ */
>>>>       unsigned Saturate   : 1;  /* BOOL */
>>>>       unsigned NumDstRegs : 2;  /* UINT */
>>>>       unsigned NumSrcRegs : 4;  /* UINT */
>>>>       unsigned Label      : 1;
>>>>       unsigned Texture    : 1;
>>>>       unsigned Memory     : 1;
>>>>       unsigned Precise    : 1;
>>>> -   unsigned Padding    : 1;
>>>> +   unsigned HalfPrecision : 1;
>>>>    };
>>>> There won't be any 16-bit TEMPs in TGSI, but each instruction will
>>>> have the HalfPrecision flag, which is a hint for drivers that they can
>>>> use a 16-bit opcode. Even texture, load, and store instructions can
>>>> set HalfPrecision, which means they can accept and return 16-bit
>>>> values.
>>>> The catch is that drivers will have to insert 16-bit <-> 32-bit
>>>> conversions manually, because they won't be present in TGSI. The
>>>> advantage is that we don't have to add 200 new opcodes for the 3 new
>>>> 16-bit types.
>>>> What do you think?
>>> Flagging instructions as 16bit doesn't look too bad to me, but I'm
>>> wondering if this isn't a bit problematic wrt register files. Clearly,
>>> this is a restriction of tgsi "everything is a 32x4 value". Doubles, of
>>> course, have a similar problem, but in the end they still have
>>> well-defined interactions with the register files, because it's defined
>>> what bits ultimately represent a 64bit value (at least in theory from
>>> tgsi's point of view, it is perfectly valid to use some 32bit
>>> calculations to set some reg, then just use double instructions directly
>>> without conversion on these values - it may not be meaningful but it is
>>> well defined).
>>> But it looks like you want to avoid to have a well-defined mapping of
>>> the registers to 16bit types (and with 16 bits instruction just being
>>> hints, I can't see how it could exist).
>>> Note that being able to flag instructions as HalfPrecision does not
>>> necessarily mean you can't have any explicit 16bit conversion
>>> instructions too.
>> Those already exist: PK2H and UP2H. Or did you have something else in mind?
>> More generally, there are really two use cases for this, and we need to
>> be careful not to mix them up:
>> - transparent downgrading to 16-bit of lowp and mediump
>> - support for extensions that explicitly introduce 16-bit types
>> For lowp and mediump, the approach of just having a HalfPrecision bit on
>> the instructions is probably fine.
>> The second case is different. I don't think there are ARB extensions for
>> that yet, but there are AMD_gpu_shader_{int16,half_float} with
>> explicitly 16-bit types. (There's also NV_half_float, but that's from
>> earlier days without GLSL.) For those, we'd really need to provide
>> exactly the required operation. No special handling of TGSI temporaries
>> is needed: an f16vec4 is represented as a normal 4-component vector in
>> TGSI, just that the upper 16 bits of each component are ignored.
> That looks ok to me, albeit you could choose that differently, hence why
> I mentioned it (you could pack your 4 16bit members into the x/y
> components of the 4x32bit vector).

I thought about this as well, but packing 4 components into x/y would 
make swizzling a nightmare.

>> Here's another question: What does "low precision" mean on a texture
>> instruction? Are the offsets low precision or is it the output? Maybe we
>> can punt on this for now -- at least GCN doesn't have low precision
>> there anyway.
>> To sum it up:
>> - I think there have to be separate flags for "this is a true 16-bit
>> instruction" and for "optional low precision" -- in the latter, the
>> driver is responsible for on-the-fly conversion between half and full types
>> - Apart from potential future issues with texture instructions, I think
>> the flags on instructions are fine. So the plan is fine for GLES
>> lowp/mediump.
>> Also, we're running out of bits here, but some of those bits can be
>> moved into a separate instruction flags word when the time comes.
> There's still some bits left in the instruction token if you really
> really need them. Type doesn't need to be 4 bits (at least one bit can
> go, even 2 is sufficient at least now, albeit you'd need to change all
> tokens), the same is true for NumSrcRegs, where 4 bits is at least one
> too many.
> I am however still wondering if it really makes sense to have both
> hinted and explicit 16bit instructions (because it looks like eventually
> it's going to be more work for drivers, having to handle both some day).

I know, it's not a completely clear-cut decision.

The main thing is that truly going to 16-bits may not always be 
beneficial because we need to introduce the conversion instruction(s), 
so it'd be neat to communicate the optionality to the driver.

Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

More information about the mesa-dev mailing list