[Mesa-dev] TGSI 16-bit support

Wed Aug 23 14:30:24 UTC 2017

On 23.08.2017 16:00, Roland Scheidegger wrote:
> Am 23.08.2017 um 15:08 schrieb Nicolai Hähnle:
>> On 22.08.2017 22:39, Roland Scheidegger wrote:
>>> Am 22.08.2017 um 19:10 schrieb Marek Olšák:
>>>> Hi,
>>>>
>>>> I'd like to discuss 16-bit float and integer support in TGSI. I'm
>>>> proposing this:
>>>>
>>>>    struct tgsi_instruction
>>>>    {
>>>>       unsigned Type       : 4;  /* TGSI_TOKEN_TYPE_INSTRUCTION */
>>>>       unsigned NrTokens   : 8;  /* UINT */
>>>>       unsigned Opcode     : 8;  /* TGSI_OPCODE_ */
>>>>       unsigned Saturate   : 1;  /* BOOL */
>>>>       unsigned NumDstRegs : 2;  /* UINT */
>>>>       unsigned NumSrcRegs : 4;  /* UINT */
>>>>       unsigned Label      : 1;
>>>>       unsigned Texture    : 1;
>>>>       unsigned Memory     : 1;
>>>>       unsigned Precise    : 1;
>>>> -   unsigned Padding    : 1;
>>>> +   unsigned HalfPrecision : 1;
>>>>    };
>>>>
>>>> There won't be any 16-bit TEMPs in TGSI, but each instruction will
>>>> have the HalfPrecision flag, which is a hint for drivers that they can
>>>> use a 16-bit opcode. Even texture, load, and store instructions can
>>>> set HalfPrecision, which means they can accept and return 16-bit
>>>> values.
>>>>
>>>> The catch is that drivers will have to insert 16-bit <-> 32-bit
>>>> conversions manually, because they won't be present in TGSI. The
>>>> advantage is that we don't have to add 200 new opcodes for the 3 new
>>>> 16-bit types.
>>>>
>>>> What do you think?
>>>>
>>>
>>> Flagging instructions as 16bit doesn't look too bad to me, but I'm
>>> wondering if this isn't a bit problematic wrt register files. Clearly,
>>> this is a restriction of tgsi "everything is a 32x4 value". Doubles, of
>>> course, have a similar problem, but in the end they still have
>>> well-defined interactions with the register files, because it's defined
>>> what bits ultimately represent a 64bit value (at least in theory from
>>> tgsi's point of view, it is perfectly valid to use some 32bit
>>> calculations to set some reg, then just use double instructions directly
>>> without conversion on these values - it may not be meaningful but it is
>>> well defined).
>>> But it looks like you want to avoid to have a well-defined mapping of
>>> the registers to 16bit types (and with 16 bits instruction just being
>>> hints, I can't see how it could exist).
>>> Note that being able to flag instructions as HalfPrecision does not
>>> necessarily mean you can't have any explicit 16bit conversion
>>> instructions too.
>>
>> Those already exist: PK2H and UP2H. Or did you have something else in mind?
>>
>> More generally, there are really two use cases for this, and we need to
>> be careful not to mix them up:
>>
>> - transparent downgrading to 16-bit of lowp and mediump
>> - support for extensions that explicitly introduce 16-bit types
>>
>> For lowp and mediump, the approach of just having a HalfPrecision bit on
>> the instructions is probably fine.
>>
>> The second case is different. I don't think there are ARB extensions for
>> that yet, but there are AMD_gpu_shader_{int16,half_float} with
>> explicitly 16-bit types. (There's also NV_half_float, but that's from
>> earlier days without GLSL.) For those, we'd really need to provide
>> exactly the required operation. No special handling of TGSI temporaries
>> is needed: an f16vec4 is represented as a normal 4-component vector in
>> TGSI, just that the upper 16 bits of each component are ignored.
> That looks ok to me, albeit you could choose that differently, hence why
> I mentioned it (you could pack your 4 16bit members into the x/y
> components of the 4x32bit vector).

I thought about this as well, but packing 4 components into x/y would 
make swizzling a nightmare.

>> Here's another question: What does "low precision" mean on a texture
>> instruction? Are the offsets low precision or is it the output? Maybe we
>> can punt on this for now -- at least GCN doesn't have low precision
>> there anyway.
>>
>> To sum it up:
>> - I think there have to be separate flags for "this is a true 16-bit
>> instruction" and for "optional low precision" -- in the latter, the
>> driver is responsible for on-the-fly conversion between half and full types
>> - Apart from potential future issues with texture instructions, I think
>> the flags on instructions are fine. So the plan is fine for GLES
>> lowp/mediump.
>>
>> Also, we're running out of bits here, but some of those bits can be
>> moved into a separate instruction flags word when the time comes.
>>
> 
> There's still some bits left in the instruction token if you really
> really need them. Type doesn't need to be 4 bits (at least one bit can
> go, even 2 is sufficient at least now, albeit you'd need to change all
> tokens), the same is true for NumSrcRegs, where 4 bits is at least one
> too many.
> 
> I am however still wondering if it really makes sense to have both
> hinted and explicit 16bit instructions (because it looks like eventually
> it's going to be more work for drivers, having to handle both some day).

I know, it's not a completely clear-cut decision.

The main thing is that truly going to 16-bits may not always be 
beneficial because we need to introduce the conversion instruction(s), 
so it'd be neat to communicate the optionality to the driver.

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.