[Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

Eero Tamminen eero.t.tamminen at intel.com
Mon May 21 10:49:30 UTC 2018


Hi,

On 21.05.2018 10:42, Iago Toral wrote:
> On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:
>> On 17.05.2018 14:25, Eero Tamminen wrote:
>>> On 17.05.2018 11:46, Iago Toral Quiroga wrote:
>>>> The PRM for MAD states that F, DF and HF are supported, however,
>>>> then
>>>> it requires that the instruction includes a 2-bit mask specifying
>>>> the types of each operand like this:
>>>
>>>   >
>>>> 00: 32-bit float
>>>> 01: 32-bit signed integer
>>>> 10: 32-bit unsigned integer
>>>> 11: 64-bit float
>>>
>>> Where this was?
> 
> This is in the decription of the MAD instruction here (for SKL):
> 
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl
> -vol02a-commandreference-instructions.pdf
> 
> It guess the contents for this were copy & pasted from previous PRMs
> that didn't support HF...

Ouch.  That looks pretty different from what's in here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf

And in BSW one:
>>> In
>>> https://01.org/sites/default/files/documentation/intel-gfx-bspec-os
>>> rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf

I'll ask around who's currently maintaining the 01.org docs, and could 
she/he update the opcode docs. All of them from BSW to KBL seem to have 
the old information, while the Media GPGPU docs have newer info.


Btw.  Did you have test-cases utilizing mad() instructions, and did
they work OK with your patchset?

(If yes, better test-cases may be required.)


>>> Section "EU Changes by Processor Generation" states:
>>> -------------------------
>>> These features or behaviors are added for CHV, BSW, continuing to
>>> later generations:
>>> ...
>>> In the 3-source instruction format, widen the SrcType and DstType
>>> fields
>>> and add an encoding for the HF (Half Float) type.
>>> -------------------------
>>>
>>> (I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)
> 
> Actually, I have just verified that the BDW PRMs have the exact same
> thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
> supported too.

Yes, right.


>>> In section "GEN Instruction Format – 3-src" table, both "Dst Type"
>>> and
>>> "Src Type" fields are 3 bits, and there's additional 1 bit "Src1
>>> Type"
>>> and "Src2 Type" fields to differentiate formats for src1 & src2.
>>
>>   >
>>> Then, when looking at "Source or Destination Operand Fields
>>> (Alphabetically by Short Name)" section:
>>> -----------------------------------------------
>>> DstType:
>>>
>>> Encoding for three source instructions:
>>> 000b = :f. Single precision Float (32-bit).
>>> 001b = :d. Signed Doubleword integer.
>>> 010b = :ud. Unsigned Doubleword integer.
>>> 011b = :df. Double precision Float (64-bit).
>>> 100b = :hf. Half precision Float (16-bit).
>>> 101b - 111b. Reserved.
>>>
>>> ...
>>>
>>> SrcType:
>>>
>>> Three source instructions use one SrcType field for all source
>>> operands,
>>> with a 3-bit encoding that allows fewer data types:
>>>
>>> Encoding for three source instructions:
>>> 000b = :f. Single precision Float (32-bit).
>>> 001b = :d. Signed Doubleword integer.
>>> 010b = :ud. Unsigned Doubleword integer.
>>> 011b = :df. Double precision Float (64-bit).
>>> 100b = :hf. Half precision Float (16-bit).
>>> 101b - 111b. Reserved.
>>>
>>> Three source instructions can use operands with mixed-mode
>>> precision.
>>> When SrcType field is set to :f or :hf it defines precision for
>>> source 0
>>> only, and fields Src1Type and Src2Type define precision for other
>>> source
>>> operands:
>>> 0b = :f. Single precision Float (32-bit).
>>> 1b = :hf. Half precision Float (16-bit).
>>> -----------------------------------------------
>>
>> Note also the section "Special Requirements for Handling Mixed Mode
>> Float Operations".

Both BDW & BSW specs list also this restriction, which is lifted GEN9:
* Calculations using the HF (Half Float) type do not support denormals 
or gradual underflow.


>> Btw. Mesa supporting HF with mad() is important because pln()
>> doesn't support it in HW, but you can implement pln() HF version
>> with two mad()s.


	- Eero

>>> [1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't
>>> tell
>>> what they represent (often one needs to look into PRM for the HW
>>> where
>>> these changes were first introduced, which in this case was
>>> Braswell /
>>> Cherryview).
>>>
>>>
>>>> So 16-bit float would not be supported.  The driver also asserts
>>>> that
>>>> the types involved in ALING16 3-src operations are one of these
>>>> (MAD is always emitted as an align16 instruction prior to gen10).
>>>> ---
>>>>    src/intel/compiler/brw_fs_nir.cpp | 9 ++++++++-
>>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
>>>> b/src/intel/compiler/brw_fs_nir.cpp
>>>> index 91283ab4911..58ddc456bae 100644
>>>> --- a/src/intel/compiler/brw_fs_nir.cpp
>>>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>>>> @@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder
>>>> &bld,
>>>> nir_alu_instr *instr)
>>>>          break;
>>>>       case nir_op_ffma:
>>>> -      inst = bld.MAD(result, op[2], op[1], op[0]);
>>>> +      /* 3-src MAD doesn't support 16-bit operands */
>>>> +      if (nir_dest_bit_size(instr->dest.dest) >= 32) {
>>>> +         inst = bld.MAD(result, op[2], op[1], op[0]);
>>>> +      } else {
>>>> +         fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
>>>> +         bld.MUL(tmp, op[1], op[0]);
>>>> +         inst = bld.ADD(result, tmp, op[2]);
>>>> +      }
>>>>          inst->saturate = instr->dest.saturate;
>>>>          break;
>>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>



More information about the mesa-dev mailing list