[Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

Wed Mar 13 09:04:30 UTC 2019

On Tue, 2019-03-12 at 15:46 -0700, Francisco Jerez wrote:
> Iago Toral <itoral at igalia.com> writes:
> 
> > On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote:
> > > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
> > > > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> > > > > Iago Toral <itoral at igalia.com> writes:
> > > > > 
> > > > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > > > > > Iago Toral <itoral at igalia.com> writes:
> > > > > > > 
> > > > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez
> > > > > > > > wrote:
> > > > > > > > > Iago Toral <itoral at igalia.com> writes:
> > > > > > > > > 
> > > > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
> > > > > > > > > > wrote:
> > > > > > > > > > > Iago Toral <itoral at igalia.com> writes:
> > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco
> > > > > > > > > > > > Jerez
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Iago Toral Quiroga <itoral at igalia.com>
> > > > > > > > > > > > > writes:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    | 
> > > > > > > > > > > > > >  64
> > > > > > > > > > > > > > ++++++++++++-
> > > > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp |
> > > > > > > > > > > > > > 122
> > > > > > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
> > > > > > > > > > > > > > deletion(-
> > > > > > > > > > > > > > )
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > > > > > general_restrictions_based_on_operand_types
> > > > > > > > > > > > > > (con
> > > > > > > > > > > > > > st
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > gen_device_info *devinf
> > > > > > > > > > > > > >         exec_type_size == 8 &&
> > > > > > > > > > > > > > dst_type_size ==
> > > > > > > > > > > > > > 4)
> > > > > > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > > > > > +    *
> > > > > > > > > > > > > > +    *    "There is no direct conversion
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > HF
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > DF
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > DF to
> > > > > > > > > > > > > > HF.
> > > > > > > > > > > > > > +    *     There is no direct conversion
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > HF
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > Q/UQ or
> > > > > > > > > > > > > > Q/UQ to
> > > > > > > > > > > > > > HF."
> > > > > > > > > > > > > > +    */
> > > > > > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > > > > > inst);
> > > > > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst)
> > > > > > > > > > > > > > ==
> > > > > > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > > > > > &&
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Why is only the MOV instruction handled here
> > > > > > > > > > > > > and
> > > > > > > > > > > > > below?  Aren't
> > > > > > > > > > > > > other
> > > > > > > > > > > > > instructions able to do implicit
> > > > > > > > > > > > > conversions?  Probably
> > > > > > > > > > > > > means
> > > > > > > > > > > > > you
> > > > > > > > > > > > > need
> > > > > > > > > > > > > to deal with two sources rather than one.
> > > > > > > > > > > > 
> > > > > > > > > > > > This comes from the programming notes of the
> > > > > > > > > > > > MOV
> > > > > > > > > > > > instruction
> > > > > > > > > > > > (Volume
> > > > > > > > > > > > 2a, Command Reference - Instructions - MOV), so
> > > > > > > > > > > > it
> > > > > > > > > > > > is
> > > > > > > > > > > > described
> > > > > > > > > > > > specifically for the MOV instruction. I should
> > > > > > > > > > > > probably
> > > > > > > > > > > > have
> > > > > > > > > > > > made
> > > > > > > > > > > > this
> > > > > > > > > > > > clear in the comment.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Maybe the one above is specified in the MOV page
> > > > > > > > > > > only,
> > > > > > > > > > > probably
> > > > > > > > > > > due
> > > > > > > > > > > to
> > > > > > > > > > > an oversight (If these restrictions were really
> > > > > > > > > > > specific
> > > > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > MOV
> > > > > > > > > > > instruction, what would prevent you from
> > > > > > > > > > > implementing
> > > > > > > > > > > such
> > > > > > > > > > > conversions
> > > > > > > > > > > through a different instruction?  E.g. "ADD
> > > > > > > > > > > dst:df,
> > > > > > > > > > > src:hf,
> > > > > > > > > > > 0"
> > > > > > > > > > > which
> > > > > > > > > > > would be substantially more efficient than what
> > > > > > > > > > > you're
> > > > > > > > > > > doing
> > > > > > > > > > > in
> > > > > > > > > > > PATCH
> > > > > > > > > > > 02)
> > > > > > > > > > 
> > > > > > > > > > Instructions that take HF can only be strictly HF
> > > > > > > > > > or
> > > > > > > > > > mix
> > > > > > > > > > F
> > > > > > > > > > and
> > > > > > > > > > HF
> > > > > > > > > > (mixed mode float), with MOV being the only
> > > > > > > > > > exception.
> > > > > > > > > > That
> > > > > > > > > > means
> > > > > > > > > > that
> > > > > > > > > > any instruction like the one you use above are
> > > > > > > > > > invalid.
> > > > > > > > > > Maybe
> > > > > > > > > > we
> > > > > > > > > > should
> > > > > > > > > > validate explicitly that instructions that use HF
> > > > > > > > > > are
> > > > > > > > > > strictly
> > > > > > > > > > HF
> > > > > > > > > > or
> > > > > > > > > > mixed-float mode only?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > So you're acknowledging that the conversions checked
> > > > > > > > > above
> > > > > > > > > are
> > > > > > > > > illegal
> > > > > > > > > whether the instruction is a MOV or something
> > > > > > > > > else.  Where
> > > > > > > > > are we
> > > > > > > > > preventing instructions other than MOV with such
> > > > > > > > > conversions
> > > > > > > > > from
> > > > > > > > > being
> > > > > > > > > accepted by the validator?
> > > > > > > > 
> > > > > > > > We aren't, because the validator is not checking, in
> > > > > > > > general,
> > > > > > > > for
> > > > > > > > accepted type combinations for specific instructions
> > > > > > > > anywhere
> > > > > > > > as
> > > > > > > > far as
> > > > > > > > I can see.
> > > > > > > 
> > > > > > > Luckily these type conversion restrictions aren't really
> > > > > > > specific
> > > > > > > to
> > > > > > > any
> > > > > > > instruction AFAICT, even though they only seem to be
> > > > > > > mentioned
> > > > > > > explicitly for the MOV instruction...
> > > > > > > 
> > > > > > > > If we want to start doing this with HF conversions
> > > > > > > > specifically, I
> > > > > > > > guess it is fine, but in case it is not clear, I think
> > > > > > > > it
> > > > > > > > won't
> > > > > > > > bring
> > > > > > > > any immediate benefits with the
> > > > > > > > VK_KHR_shader_float16_int8
> > > > > > > > implementation since this series only ever emits
> > > > > > > > conversions
> > > > > > > > involving
> > > > > > > > HF operands via MOV instructions,
> > > > > > > 
> > > > > > > Yes, I can see that's the intention of this series, but
> > > > > > > how
> > > > > > > can
> > > > > > > you
> > > > > > > make
> > > > > > > sure that nothing in the optimizer is breaking your
> > > > > > > assumption
> > > > > > > if
> > > > > > > you
> > > > > > > don't add some validator code to verify the claim of your
> > > > > > > last
> > > > > > > paragraph?
> > > > > > > 
> > > > > > > > which is why I thought that validating that no direct
> > > > > > > > MOV
> > > > > > > > conversions
> > > > > > > > from DF/Q types ever happen was useful, since we have
> > > > > > > > code
> > > > > > > > in
> > > > > > > > the
> > > > > > > > driver to handle this scenario.
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > > 
> > > > > > > > > I still don't understand why you want to implement
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > > restriction
> > > > > > > > > twice, once for MOV and once for all other mixed-mode
> > > > > > > > > instructions.  How
> > > > > > > > > is that more convenient?
> > > > > > > > 
> > > > > > > > The restrictions based on operand types that are
> > > > > > > > checked in
> > > > > > > > the
> > > > > > > > validator are specific to Byte or cases where the
> > > > > > > > execution
> > > > > > > > type is
> > > > > > > > larger than the destination stride, for which mixed
> > > > > > > > float
> > > > > > > > has
> > > > > > > > a
> > > > > > > > different set of restrictions.
> > > > > > > > 
> > > > > > > > For example, for mixed float we have this rule:
> > > > > > > > 
> > > > > > > > "In Align1, destination stride can be smaller than
> > > > > > > > execution
> > > > > > > > type"
> > > > > > > > 
> > > > > > > > Which overrides this other from "General restrictions
> > > > > > > > based
> > > > > > > > on
> > > > > > > > operand
> > > > > > > > types":
> > > > > > > > 
> > > > > > > > "Destination stride must be equal to the ratio of the
> > > > > > > > sizes
> > > > > > > > of
> > > > > > > > the
> > > > > > > > execution data type to the destination type"
> > > > > > > > 
> > > > > > > > So I thought that it would make things easier to keep
> > > > > > > > all
> > > > > > > > restrictions
> > > > > > > > for mixed float separate and make sure that we skipped
> > > > > > > > mixed
> > > > > > > > float
> > > > > > > > instructions in
> > > > > > > > general_restrictions_based_on_operand_types()
> > > > > > > > so we
> > > > > > > > avoid having to add code to skip individual general
> > > > > > > > restrictions
> > > > > > > > that that are overriden for mixed float mode anyway.
> > > > > > > > 
> > > > > > > 
> > > > > > > I'm fine with that, but it doesn't seem like what this
> > > > > > > patch
> > > > > > > is
> > > > > > > doing...
> > > > > > > Isn't it adding code to validate mixed-mode float MOV
> > > > > > > instructions in
> > > > > > > general_restrictions_based_on_operand_types()?
> > > > > > 
> > > > > > I guess this could be arguable, but I was not considering
> > > > > > conversion
> > > > > > MOVs to be mixed-float instructions. There are two reasons
> > > > > > for
> > > > > > this:
> > > > > > 
> > > > > > A conversion MOV involving F/HF doesn't seem to be
> > > > > > fundamentally
> > > > > > different from any other conversion instruction involving
> > > > > > other
> > > > > > types,
> > > > > > other than the requirement of aligning the destination to a
> > > > > > Dword,
> > > > > > which is not a resriction explictly made for mixed-float
> > > > > > mode.
> > > > > > 
> > > > > > Then, for mixed-float mode, there is this other rule:
> > > > > > 
> > > > > > "In Align1, destination stride can be smaller than
> > > > > > execution
> > > > > > type.
> > > > > > When
> > > > > > destination is stride of 1, 16 bit packed data
> > > > > > is updated on the destination. However, output packed f16
> > > > > > data
> > > > > > must
> > > > > > be
> > > > > > oword aligned, no oword crossing in
> > > > > > packed f16."
> > > > > > 
> > > > > > Which contradicts the other rule that conversions from F to
> > > > > > HF
> > > > > > need
> > > > > > to
> > > > > > be DWord aligned on the destination.
> > > > > > 
> > > > > > So it seems to me that conversion MOVs are not following
> > > > > > the
> > > > > > same
> > > > > > principles of mixed-float instructions and we should skip
> > > > > > validating
> > > > > > mixed-float restrictions for them. What do you think?
> > > > > > 
> > > > > 
> > > > > That all seems fairly ambiguous...  And the restriction on
> > > > > DWORD
> > > > > alignment for conversions includes a mixed-mode ADD
> > > > > instruction
> > > > > among
> > > > > the examples, so I'm skeptical that MOV could be truly
> > > > > fundamentally
> > > > > different.
> > > > 
> > > > Ok... in that case what do we do about the mixed-float
> > > > restriction
> > > > I
> > > > quoted above? Since it is incompatible with the mixed-float MOV
> > > > conversion I guess we only have two options: ether we don't
> > > > validate
> > > > it
> > > > at all or we only validate it for mixed-float instructions that
> > > > are
> > > > not
> > > > MOV. I guess we can do the latter?
> > > 
> > > Also, related to this, if we consider implicit conversions in 2-
> > > src
> > > instructions to also be a target of the generic rules for
> > > conversions
> > > to HF described for CHV and SKL+, then doesn't that imply that
> > > all
> > > mixed-float instructions are conversions and subject to those
> > > rules?
> > > For example:
> > > 
> > > ADD dst:hf  src0:f  src1:f
> > > ADD dst:hf  src0:hf src1:f
> > > ADD dst:hf  src0:f  src1:hf
> > > 
> > > In all 3 instructions above the execution type is F. Should we
> > > consider
> > > all of them implicit conversions? That would mean that any mixed-
> > > float
> > > instruction with HF destination is an implicit conversion from F
> > > to
> > > HF
> > > and therefore would be subject to the generic rules for those
> > > conversions, which at least in the case of CHV and SKL+ involves
> > > DWord
> > > alignment on the destination, or in other words: no mixed-float
> > > instruction can have packed fp16 destinations. But that seems to
> > > contradict what various mixed-float restrictions present in the
> > > docs
> > > that suggest that packed fp16 destinations are possible with
> > > mixed-
> > > float mode. Here are some examples:
> > > 
> > > "No SIMD16 in mixed mode when destination is packed f16 for both
> > > Align1
> > > and Align16"
> > > 
> > > "In Align1, destination stride can be smaller than execution
> > > type.
> > > When
> > > destination is stride of 1, 16 bit packed data is updated on the
> > > destination. However, output packed f16 data must be oword
> > > aligned,
> > > no
> > > oword crossing in packed f16."
> > > 
> > > "When source is float or half float from accumulator register and
> > > destination is half float with a stride of 1, the source must
> > > register
> > > aligned. i.e., source must have offset zero."
> > > 
> > > Since I was only considering that conversion rules only applied
> > > to
> > > MOV
> > > instructions this was not an issue in my series, but if we
> > > consider
> > > that those rules also apply to implicit conversions from other
> > > instructions then it looks like all these restrictions refer to
> > > impossible situations, so I am not sure what to do about them...
> > > should
> > > we just ignore them in the validator?
> > 
> > And I have just noticed another issue with the idea of considering
> > direct conversions involving F/HF mixed-float instructions. There
> > is
> > this other restriction for mixed-float mode:
> > 
> > "No SIMD16 in mixed mode when destination is f32. Instruction
> > Execution
> > size must be no more than 8."
> > 
> > So if we consider direct HF->F conversions mixed-float instructions
> > then we would habe to split all of them to be 8-wide. Obviously, I
> > have
> > not been doing this and I have never seen any issues on any
> > platform
> > because of this.
> > 
> 
> Hm...  That's kind of scary...  Your code seems to be violating this
> and
> the other no-SIMD16 restriction of mixed-mode instructions to my best
> interpretation of the B-Spec.  Unfortunately the simulator doesn't
> seem
> to enforce either of them so I'm not certain whether it's going to
> blow
> up in practice or not.  Maybe implement the restriction in
> get_fpu_lowered_simd_width() to be on the safe side?

Yeah... this is quite unfortunate. I have not seen any problem with
this on any platform I have tested, and I guess that the fact that the
simulator doesn't complain about them might be another hint that this
is actually not a problem but it is annoying to have the docs and the
testing give us different info :-(.

It is particularly bad in this case, because when you start mixing
half-float and float you inevitably get a lot of conversions between
the two types along the way (for example, there are a bunch of
instructions that do not support half-float so we need to convert to
F), so if we decide to split conversions with F destination I imagine
it is going to have a visible  negative effect on performance, which is
particularly bad if it ends up being that there is not such restriction
for MOV in practice after all. Anyway, I suppose taking the safe
approach could make sense...

> > Iago
> > 
> > 
> > 
> > > 
> > > > > > > > Anyway, I'll try to rework the patches to do more
> > > > > > > > generic
> > > > > > > > validation of
> > > > > > > > HF conversions like you ask and see what kind of code
> > > > > > > > we
> > > > > > > > end
> > > > > > > > up
> > > > > > > > with.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > > [...]