[Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

Thu May 17 11:25:41 UTC 2018

Hi,

On 17.05.2018 11:46, Iago Toral Quiroga wrote:
> The PRM for MAD states that F, DF and HF are supported, however, then
> it requires that the instruction includes a 2-bit mask specifying
> the types of each operand like this:
 >
> 00: 32-bit float
> 01: 32-bit signed integer
> 10: 32-bit unsigned integer
> 11: 64-bit float

Where this was?

In 
https://01.org/sites/default/files/documentation/intel-gfx-bspec-osrc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf

Section "EU Changes by Processor Generation" states:
-------------------------
These features or behaviors are added for CHV, BSW, continuing to later 
generations:
...
In the 3-source instruction format, widen the SrcType and DstType fields 
and add an encoding for the HF (Half Float) type.
-------------------------

(I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)

In section "GEN Instruction Format – 3-src" table, both "Dst Type" and 
"Src Type" fields are 3 bits, and there's additional 1 bit "Src1 Type" 
and "Src2 Type" fields to differentiate formats for src1 & src2.

Then, when looking at "Source or Destination Operand Fields 
(Alphabetically by Short Name)" section:
-----------------------------------------------
DstType:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

...

SrcType:

Three source instructions use one SrcType field for all source operands, 
with a 3-bit encoding that allows fewer data types:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

Three source instructions can use operands with mixed-mode precision. 
When SrcType field is set to :f or :hf it defines precision for source 0 
only, and fields Src1Type and Src2Type define precision for other source 
operands:
0b = :f. Single precision Float (32-bit).
1b = :hf. Half precision Float (16-bit).
-----------------------------------------------

	- Eero

[1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't tell 
what they represent (often one needs to look into PRM for the HW where 
these changes were first introduced, which in this case was Braswell / 
Cherryview).

> So 16-bit float would not be supported.  The driver also asserts that
> the types involved in ALING16 3-src operations are one of these
> (MAD is always emitted as an align16 instruction prior to gen10).
> ---
>   src/intel/compiler/brw_fs_nir.cpp | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp
> index 91283ab4911..58ddc456bae 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>         break;
>   
>      case nir_op_ffma:
> -      inst = bld.MAD(result, op[2], op[1], op[0]);
> +      /* 3-src MAD doesn't support 16-bit operands */
> +      if (nir_dest_bit_size(instr->dest.dest) >= 32) {
> +         inst = bld.MAD(result, op[2], op[1], op[0]);
> +      } else {
> +         fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
> +         bld.MUL(tmp, op[1], op[0]);
> +         inst = bld.ADD(result, tmp, op[2]);
> +      }
>         inst->saturate = instr->dest.saturate;
>         break;
>   
>