[Mesa-dev] TGSI 16-bit support

Wed Aug 23 16:19:27 UTC 2017

On Wed, Aug 23, 2017 at 3:08 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 22.08.2017 22:39, Roland Scheidegger wrote:
>>
>> Am 22.08.2017 um 19:10 schrieb Marek Olšák:
>>>
>>> Hi,
>>>
>>> I'd like to discuss 16-bit float and integer support in TGSI. I'm
>>> proposing this:
>>>
>>>   struct tgsi_instruction
>>>   {
>>>      unsigned Type       : 4;  /* TGSI_TOKEN_TYPE_INSTRUCTION */
>>>      unsigned NrTokens   : 8;  /* UINT */
>>>      unsigned Opcode     : 8;  /* TGSI_OPCODE_ */
>>>      unsigned Saturate   : 1;  /* BOOL */
>>>      unsigned NumDstRegs : 2;  /* UINT */
>>>      unsigned NumSrcRegs : 4;  /* UINT */
>>>      unsigned Label      : 1;
>>>      unsigned Texture    : 1;
>>>      unsigned Memory     : 1;
>>>      unsigned Precise    : 1;
>>> -   unsigned Padding    : 1;
>>> +   unsigned HalfPrecision : 1;
>>>   };
>>>
>>> There won't be any 16-bit TEMPs in TGSI, but each instruction will
>>> have the HalfPrecision flag, which is a hint for drivers that they can
>>> use a 16-bit opcode. Even texture, load, and store instructions can
>>> set HalfPrecision, which means they can accept and return 16-bit
>>> values.
>>>
>>> The catch is that drivers will have to insert 16-bit <-> 32-bit
>>> conversions manually, because they won't be present in TGSI. The
>>> advantage is that we don't have to add 200 new opcodes for the 3 new
>>> 16-bit types.
>>>
>>> What do you think?
>>>
>>
>> Flagging instructions as 16bit doesn't look too bad to me, but I'm
>> wondering if this isn't a bit problematic wrt register files. Clearly,
>> this is a restriction of tgsi "everything is a 32x4 value". Doubles, of
>> course, have a similar problem, but in the end they still have
>> well-defined interactions with the register files, because it's defined
>> what bits ultimately represent a 64bit value (at least in theory from
>> tgsi's point of view, it is perfectly valid to use some 32bit
>> calculations to set some reg, then just use double instructions directly
>> without conversion on these values - it may not be meaningful but it is
>> well defined).
>> But it looks like you want to avoid to have a well-defined mapping of
>> the registers to 16bit types (and with 16 bits instruction just being
>> hints, I can't see how it could exist).
>> Note that being able to flag instructions as HalfPrecision does not
>> necessarily mean you can't have any explicit 16bit conversion
>> instructions too.
>
>
> Those already exist: PK2H and UP2H. Or did you have something else in mind?
>
> More generally, there are really two use cases for this, and we need to be
> careful not to mix them up:
>
> - transparent downgrading to 16-bit of lowp and mediump
> - support for extensions that explicitly introduce 16-bit types
>
> For lowp and mediump, the approach of just having a HalfPrecision bit on the
> instructions is probably fine.
>
> The second case is different. I don't think there are ARB extensions for
> that yet, but there are AMD_gpu_shader_{int16,half_float} with explicitly
> 16-bit types. (There's also NV_half_float, but that's from earlier days
> without GLSL.) For those, we'd really need to provide exactly the required
> operation. No special handling of TGSI temporaries is needed: an f16vec4 is
> represented as a normal 4-component vector in TGSI, just that the upper 16
> bits of each component are ignored.

I wanted to avoid adding 16-bit opcodes to TGSI because it's too much work.

>
> Here's another question: What does "low precision" mean on a texture
> instruction? Are the offsets low precision or is it the output? Maybe we can
> punt on this for now -- at least GCN doesn't have low precision there
> anyway.

HalfPrecision means that all dst and src sources can be 16-bit.

If the consumer of a TEX instruction is 16-bit, TEX should return
16-bit automatically. If a source of a TEX instruction is 16-bit, TEX
should accept 16-bit automatically.

GFX9 can have 16-bit inputs and outputs in buffer and image
instructions. We also have 16-bit interpolation. We could, in theory,
run a whole pixel shader with 16-bit precision.

Marek