[Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion
Francisco Jerez
currojerez at riseup.net
Mon Feb 4 19:11:09 UTC 2019
Iago Toral <itoral at igalia.com> writes:
> On Mon, 2019-02-04 at 08:50 +0100, Iago Toral wrote:
>> On Fri, 2019-02-01 at 11:23 -0800, Francisco Jerez wrote:
>> > Iago Toral <itoral at igalia.com> writes:
>> >
>> > > On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> > > > Iago Toral <itoral at igalia.com> writes:
>> > > >
>> > > > > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral <itoral at igalia.com> writes:
>> > > > > >
>> > > > > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > > > > > > Iago Toral Quiroga <itoral at igalia.com> writes:
>> > > > > > > >
>> > > > > > > > > Commit c84ec70b3a72 implemented execution type
>> > > > > > > > > promotion to
>> > > > > > > > > 32-
>> > > > > > > > > bit
>> > > > > > > > > for
>> > > > > > > > > conversions involving half-float registers, which
>> > > > > > > > > empirical
>> > > > > > > > > testing
>> > > > > > > > > suggested
>> > > > > > > > > was required, but it did not incorporate this change
>> > > > > > > > > into
>> > > > > > > > > the
>> > > > > > > > > assembly validator
>> > > > > > > > > logic. This commits adds that, preventing validation
>> > > > > > > > > errors
>> > > > > > > > > like
>> > > > > > > > > this:
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > I don't think we should be validating empirical
>> > > > > > > > assumptions
>> > > > > > > > in
>> > > > > > > > the EU
>> > > > > > > > validator.
>> > > > > > >
>> > > > > > > I am not sure I get your point, isn't c84ec70b3a72 also
>> > > > > > > based
>> > > > > > > on
>> > > > > > > empirical testing after all?
>> > > > > > >
>> > > > > >
>> > > > > > To some extent, but it doesn't attempt to enforce ISA
>> > > > > > restrictions
>> > > > > > based
>> > > > > > on information obtained empirically.
>> > > > > >
>> > > > > > >
>> > > > > > > > > mov(16) g9<4>B g3<16,8,2>HF { align1 1H };
>> > > > > > > > > ERROR: Destination stride must be equal to the ratio
>> > > > > > > > > of
>> > > > > > > > > the
>> > > > > > > > > sizes
>> > > > > > > > > of the
>> > > > > > > > > execution data type to the destination type
>> > > > > > > > >
>> > > > > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type
>> > > > > > > > > to
>> > > > > > > > > 32-bit
>> > > > > > > > > when any half-float conversion is needed."
>> > > > > > > >
>> > > > > > > > I don't think this "fixes" anything that ever worked.
>> > > > > > >
>> > > > > > > It is true that the code in that trace above is not
>> > > > > > > something
>> > > > > > > we
>> > > > > > > can
>> > > > > > > produce right now, because it is a conversion from HF to
>> > > > > > > B
>> > > > > > > and
>> > > > > > > that
>> > > > > > > should only happen within the context of
>> > > > > > > VK_KHR_shader_float16_int8,
>> > > > > > > however, this is a consequence of the fact that since
>> > > > > > > c84ec70b3a72
>> > > > > > > there is an inconsistency between what we do at the IR
>> > > > > > > level
>> > > > > > > regarding
>> > > > > > > execution size of HF conversions and what the EU
>> > > > > > > validator
>> > > > > > > is
>> > > > > > > doing,
>> > > > > > > and from that perspective this is really fixing an
>> > > > > > > inconsistency
>> > > > > > > that
>> > > > > > > didn't exist before, and I thought we would want to
>> > > > > > > address
>> > > > > > > that
>> > > > > > > sooner
>> > > > > > > rather than later and track it down to the original
>> > > > > > > change
>> > > > > > > that
>> > > > > > > introduced that inconsistency so we know where this is
>> > > > > > > coming
>> > > > > > > from.
>> > > > > > >
>> > > > > >
>> > > > > > The "inconsistency" between the IR's get_exec_type() and
>> > > > > > the
>> > > > > > EU
>> > > > > > validator's execution_type() has existed ever since
>> > > > > > a05b6f25bf4bfad7
>> > > > > > removed the HF assert from get_exec_type() without actually
>> > > > > > implementing
>> > > > > > the code required to handle HF operands (which is what my
>> > > > > > commit
>> > > > > > c84ec70b3a72 did).
>> > > > >
>> > > > > I agree with the fact that since a05b6f25bf4bfad7 the
>> > > > > validator
>> > > > > could
>> > > > > reject valid code and that had nothing to do with your patch,
>> > > >
>> > > > The validator rejected the same valid HF code since it was
>> > > > written,
>> > > > that
>> > > > had nothing to do with neither a05b6f25bf4bfad7 nor with my
>> > > > patch,
>> > > > and
>> > > > it is the real problem this patch was working around.
>> > > >
>> > > > > but the inconsistency I am talking about here, that this
>> > > > > patch
>> > > > > fixes,
>> > > > > is the one about get_exec_type() in the IR and
>> > > > > execution_type()
>> > > > > in
>> > > > > the
>> > > > > validator doing different things for HF instructions, which
>> > > > > only
>> > > > > exists since your patch and which you discuss below.
>> > > > >
>> > > >
>> > > > The "inconsistency" exists ever since get_exec_type() was
>> > > > introduced
>> > > > without correct handling of HF types (even though
>> > > > execution_type()
>> > > > already attempted to handle it). And I disagree that it's a
>> > > > real
>> > > > inconsistency except due to the fact that the validator is
>> > > > incorrectly
>> > > > attempting to validate the alignment of the destination region
>> > > > according
>> > > > to a rule that doesn't apply to HF types.
>> > > >
>> > > > > > > Anyway, that was my rationale for the Fixes tag, but if
>> > > > > > > you
>> > > > > > > think
>> > > > > > > this
>> > > > > > > is not useful I am happy to drop this patch and just
>> > > > > > > include it
>> > > > > > > as
>> > > > > > > part
>> > > > > > > of my series without the tag.
>> > > > > > >
>> > > > > >
>> > > > > > I'd like to see the actual regioning restrictions for HF
>> > > > > > types
>> > > > > > implemented in the EU validator as part of your series.
>> > > > >
>> > > > > Ok, let's see if we can agree on what restrictions should we
>> > > > > implement
>> > > > > then. I can implement this restriction as documented:
>> > > > >
>> > > > > "Conversion between Integer and HF (Half Float) must be
>> > > > > DWord-
>> > > > > aligned
>> > > > > and strided by a DWord on the destination"
>> > > > >
>> > > > > Instead of trying to apply the general one that is currently
>> > > > > breaking.
>> > > > > That will fix the assertion issue. I guess my issue with it
>> > > > > was
>> > > > > that
>> > > > > going by your analysis this restriction is not telling the
>> > > > > full
>> > > > > picture, this is what you had to say about this restriction:
>> > > > >
>> > > > > "I have a feeling that the reason for this may be that the
>> > > > > 16-
>> > > > > bit
>> > > > > pipeline lacks the ability to handle conversions from or to
>> > > > > half-
>> > > > > float,
>> > > > > so the execution type is implicitly promoted to the matching
>> > > > > (integer
>> > > > > or floating-point) 32-bit type where any HF conversion would
>> > > > > be
>> > > > > needed. And on those the usual alignment restriction of the
>> > > > > destination to a larger execution type applies."
>> > > > >
>> > > > > But I guess your point is that we can ignore these details at
>> > > > > the
>> > > > > validator level and just focus on validating strictly what
>> > > > > the
>> > > > > PRM
>> > > > > says. Fair enough.
>> > > > >
>> > > > > Unfortunatley, you also mentioned that this restriction
>> > > > > *seems*
>> > > > > to
>> > > > > be
>> > > > > overriden by this one on all platforms but BDW (even though
>> > > > > the
>> > > > > both
>> > > > > are listed, at least I see both in my SKL docs):
>> > > > >
>> > > > > "There is a relaxed alignment rule for word destinations.
>> > > > > When
>> > > > > the
>> > > > > destination type is word (UW, W, HF), destination data types
>> > > > > can be
>> > > > > aligned to either the lowest word or the second lowest word
>> > > > > of
>> > > > > the
>> > > > > execution channel. This means the destination data words can
>> > > > > be
>> > > > > either
>> > > > > all in the even word locations or all in the odd word
>> > > > > locations."
>> > > > >
>> > > >
>> > > > I'd implement this one since it seems like the most specific,
>> > > > except
>> > > > on
>> > > > BDW where only the previous (almost equivalent) restriction
>> > > > seems
>> > > > to
>> > > > apply.
>> > >
>> > > I have just realized that this restriction could be at odds with
>> > > this
>> > > other aspect of the spec:
>> > >
>> > > "There is no direct conversion from B/UB to DF or DF to B/UB. Use
>> > > two
>> > > instructions and a word or DWord intermediate type."
>> > >
>> > > So it is okay to convert from B to W and then from W to DF/Q (it
>> > > has
>> > > the same text from conversions between HF and 64-bit types). We
>> > > actually emit these conversions and they work fine going by the
>> > > existing CTS tests, but these conversions are not aligned to 32-
>> > > bit
>> > > like the restriction states, instead they are aligned to 64-bit
>> > > (the
>> > > regioning lowering pass does this):
>> > >
>> > > mov(8) g14<4>W g5<4,4,1>Q { align1 2Q };
>> > >
>> > > So I think this is a bit inconsistent again and I guess we need
>> > > to
>> > > choose what we want to do. I can think of 4 options:
>> > >
>> > > 1. We could assume that the restriction is correctly formulated
>> > > and
>> > > split the instructions to respect the restriction as-is even if
>> > > we
>> > > have
>> > > not found any evidence that this doesn't work
>> > >
>> > > 2. We could assume that the restriction is correctly formulated
>> > > and
>> > > the
>> > > regioning lowering pass need to be fixed to generate a conversion
>> > > with
>> > > a stride of 2 even if the execution size if 64-bit (haven't tried
>> > > this
>> > > since it sounds sketchy to me).
>> > >
>> > > 3. We could understand that the restriction is strictly
>> > > formulated
>> > > for
>> > > conversions from a 32-bit execution types, and decide that it
>> > > does
>> > > not
>> > > apply to conversions from 64-bit sources.
>> > >
>> > > 4. We can assume that the restriction is formulated incorrectly
>> > > and
>> > > instead of saying that destination words should all be in even or
>> > > odd
>> > > slots, assuming a conversion from a 32-bit type, it should just
>> > > say
>> > > that they should be aligned to the execution type to also
>> > > accomodate
>> > > conversions from 64-bit types.
>> > >
>> >
>> > 4. is basically what the back-end is assuming right now, but it is
>> > indeed not literally equivalent to the regioning restrictions as
>> > they're
>> > formulated in the B-Spec. I'd suggest not validating the DWORD
>> > alignment rule for HF instructions whenever the execution type is
>> > greater than 32 bits, since the B-Spec language about those is
>> > rather
>> > inconsistent.
>
> What about conversions like :B->:W or :W->:UW? These have a word
> destination and would be subject to the same restriction, but the
> execution type is not DWord, so we are not currently forcing DWord
> alignment for them. I have not seen anythig fail for these conversions
> but if we implement the rule for them too we will fail validation. I
> suppose we could decide to validate the rule only HF instructions like
> you were (probably purposedly?) suggesting above.
>
Yes, I really only meant we should validate this HF restriction for
instructions with an HF conversion like the hardware spec says.
>> Ok, I will do that. Just one thing: I guess that when you say HF
>> instructions you mean 16-bit destinations, since that is the target
>> of
>> the restriction.
>> > > And indepedently of how we decide to interpret the restriction in
>> > > the
>> > > end, given that we have discussed here that we should only
>> > > validate
>> > > things as they are described in the PRM, I think we also need to
>> > > decide
>> > > what we implement in the validator (if anything) if we decide to
>> > > go
>> > > with either 3) or 4) (were we would not be exactly following the
>> > > PRM to
>> > > the letter).
>> > >
>> > > I kind of think that 4) might be what is going on here...
>> > > what do you think?
>> > >
>> > > > There are shitloads of other restrictions we're missing for HF
>> > > > types BTW, check out the section "Special Requirements for
>> > > > Handling
>> > > > Mixed Mode Float Operations" of the hardware spec.
>> > > >
>> > > > > Which you discussed didn't make sense to you and I agreed, if
>> > > > > only
>> > > > > because my own experiments suggested that the implications
>> > > > > that
>> > > > > one
>> > > > > would derive from a strict interpretation of this (that
>> > > > > packed
>> > > > > 16-
>> > > > > bit
>> > > > > data is not supported) are not quite true going by the fact
>> > > > > that we
>> > > > > are
>> > > > > emitting packed instructions on all platforms without any
>> > > > > indication
>> > > > > that this doesn't work. So what should we do about this one?
>> > > > > If
>> > > > > we
>> > > > > decide that we want to implement this then we have to re-
>> > > > > think
>> > > > > the
>> > > > > implementation we have.
>> > > > >
>> > > >
>> > > > The implication on packed HF instructions is likely accidental,
>> > > > I
>> > > > don't
>> > > > think we should enforce the rule except for conversion
>> > > > operations.
>> > > >
>> > > > > Should we just validate the one about the integer/float
>> > > > > conversions?
>> > > > > Should we do that only on BDW or on all platforms?
>> > > > >
>> > > > > > I don't see the
>> > > > > > need to introduce any empirical assumptions into the EU
>> > > > > > validator's
>> > > > > > execution_type() in order to achieve that, even if that
>> > > > > > means
>> > > > > > that
>> > > > > > get_exec_type() and execution_type() don't do the exact
>> > > > > > same
>> > > > > > calculation
>> > > > > > -- What you call an inconsistency is the consequence of
>> > > > > > execution_type()
>> > > > > > being the hardware spec's opinion on what the execution
>> > > > > > type
>> > > > > > is,
>> > > > > > which
>> > > > > > we assume is what we need to use while enforcing a
>> > > > > > regioning
>> > > > > > restriction
>> > > > > > that refers to the execution type of the instruction.
>> > > > >
>> > > > > Mmm... I guess my question is: is the hardware really
>> > > > > assuming
>> > > > > that
>> > > > > execution type in all cases that involve HF instructions?
>> > > > > What
>> > > > > I
>> > > > > got
>> > > > > from your analysis of the whole situation is that the
>> > > > > hardware
>> > > > > is
>> > > > > actually promoting the execution size to 32-bit in some
>> > > > > cases,
>> > > > > and
>> > > > > if
>> > > > > that is the case, then does it make sense to implement code
>> > > > > in
>> > > > > the
>> > > > > validator that ignores that fact?
>> > > >
>> > > > If the hardware spec chose to ignore that fact and instead gave
>> > > > us a
>> > > > different set of regioning restrictions for us to use on HF
>> > > > restrictions
>> > > > that don't rely on the definition of execution type, we should
>> > > > probably
>> > > > have the validator match the hardware spec literally.
>> > > >
>> > > > > I understand that you think this doesn't have any
>> > > > > significance
>> > > > > once
>> > > > > we
>> > > > > make sure that we only apply the restrictions specifically
>> > > > > documented
>> > > > > for HF from the PRM, and you are probably right, however I
>> > > > > have
>> > > > > to
>> > > > > admit that it feels a bit odd to me that we do that when we
>> > > > > know
>> > > > > that
>> > > > > under the hood there is much more going on and we are
>> > > > > producing
>> > > > > FS
>> > > > > IR
>> > > > > code following different rules after all. But maybe it is
>> > > > > just
>> > > > > me
>> > > > > being too paranoid about this...
>> > > > >
>> > > >
>> > > > Wouldn't it be even odder to have the hardware restriction
>> > > > validator
>> > > > diverge from the documentation it's supposedly validating?
>> > > >
>> > > > >
>> > > > > > > > The validator is
>> > > > > > > > still missing an implementation of the quirky HF
>> > > > > > > > restrictions,
>> > > > > > > > and it
>> > > > > > > > wasn't the purpose of c84ec70b3a72 to do such a thing.
>> > > > > > >
>> > > > > > > While this is true in general, the EU validator does
>> > > > > > > consider
>> > > > > > > the
>> > > > > > > execution type of the instruction to validate general
>> > > > > > > rules
>> > > > > > > such as
>> > > > > > > the
>> > > > > > > one I mentioned in the commit message in this patch. And
>> > > > > > > that
>> > > > > > > part
>> > > > > > > of
>> > > > > > > the validator is inconsistent with c84ec70b3a72.
>> > > > > >
>> > > > > > That part of the validator was also inconsistent with the
>> > > > > > code
>> > > > > > generated
>> > > > > > by your original VK_KHR_shader_float16_int8 series even
>> > > > > > before I
>> > > > > > committed c84ec70b3a72. The reason is that it is trying to
>> > > > > > validate
>> > > > > > a
>> > > > > > restriction that rejects working code, because the
>> > > > > > "general"
>> > > > > > rule
>> > > > > > it's
>> > > > > > trying to enforce isn't supposed to apply to instructions
>> > > > > > with HF
>> > > > > > operands, which is the real bug.
>> > > > >
>> > > > > I agree with that, and I'll fix that in my series by
>> > > > > preventing
>> > > > > that
>> > > > > rule to apply to these instructions (pending to agree on the
>> > > > > rules
>> > > > > that
>> > > > > we should validate instead), but as I have already explained
>> > > > > above,
>> > > > > that was not the inconsistency I was referring to.
>> > > > >
>> > > >
>> > > > It's the only actual inconsistency this was fixing.
>> > > >
>> > > > > > > In fact, the EU validator is accounting for execution
>> > > > > > > size
>> > > > > > > promotion
>> > > > > > > of HF instructions to 32-bit in SKL+ and CHV only, for
>> > > > > > > conversions
>> > > > > > > from HF->F and mixed float mode instructions... which is
>> > > > > > > part
>> > > > > > > of
>> > > > > > > what
>> > > > > > > c84ec70b3a72 addresses at the IR level, which it actually
>> > > > > > > does
>> > > > > > > for
>> > > > > > > all
>> > > > > > > hardware platforms and in more cases.
>> > > > > > >
>> > > > > >
>> > > > > > I'm fine with fixing execution_type() to do the right thing
>> > > > > > in
>> > > > > > more
>> > > > > > cases and platforms, but I don't think that should involve
>> > > > > > making
>> > > > > > empirical assumptions in the validator.
>> > > > >
>> > > > > Just to be clear, in the particular case we are discussing
>> > > > > here, I
>> > > > > understand this means that we should leave the validator's
>> > > > > execution_type() as-is.
>> > > > >
>> > > >
>> > > > That's not what I was saying. Last time I looked at
>> > > > execution_type()
>> > > > it
>> > > > did seem somewhat sketchy regarding HF types. I wouldn't be
>> > > > surprised
>> > > > if it needs some fixes to match the hardware spec (*not* our
>> > > > empirical
>> > > > knowledge of the hardware) in addition to the above.
>> > > >
>> > > > > > > > You *should*
>> > > > > > > > definitely implement those restrictions (as they're
>> > > > > > > > stated in
>> > > > > > > > the
>> > > > > > > > hardware spec, without empirical assumptions) in the
>> > > > > > > > validator as
>> > > > > > > > part
>> > > > > > > > of your VK_KHR_shader_float16_int8 series,
>> > > > > > >
>> > > > > > > Again, I am not sure what you mean by "without empirical
>> > > > > > > assumptions".
>> > > > > >
>> > > > > > I was just paraphrasing your comment. If you feel the need
>> > > > > > to
>> > > > > > write
>> > > > > > a
>> > > > > > comment justifying the existence of some code in the
>> > > > > > validator
>> > > > > > based
>> > > > > > on
>> > > > > > empirical testing, it probably doesn't belong in the
>> > > > > > validator.
>> > > > > >
>> > > > > > > According to the commit message in c84ec70b3a72 "the docs
>> > > > > > > are
>> > > > > > > fairly
>> > > > > > > imcomplete and inconsistent" and you explained here that
>> > > > > > > you
>> > > > > > > had to
>> > > > > > > do
>> > > > > > > some experimentation of your own using the simulator
>> > > > > > > (where
>> > > > > > > you
>> > > > > > > found
>> > > > > > > its results to also be inconsistent with the hardware
>> > > > > > > docs)
>> > > > > > > to
>> > > > > > > try
>> > > > > > > and
>> > > > > > > guess what is happening. That's why I was using the term
>> > > > > > > "empirical"
>> > > > > > > here, since the hardware docs alone aren't clear or
>> > > > > > > correct
>> > > > > > > enough
>> > > > > > > to
>> > > > > > > understand what is really happening, when and in what
>> > > > > > > platforms.
>> > > > > > >
>> > > > > >
>> > > > > > If some hardware restriction is described in a
>> > > > > > contradictory
>> > > > > > or
>> > > > > > ambiguous way by the hardware spec we should probably avoid
>> > > > > > enforcing
>> > > > > > it
>> > > > > > altogether in the EU validator.
>> > > > >
>> > > > > Okay, in that case I guess you would rule the restriction
>> > > > > about
>> > > > > the
>> > > > > relaxed alignment rule for Word destinations as such, but not
>> > > > > the
>> > > > > one
>> > > > > about half-float/integer conversions. Is that correct?
>> > > > >
>> > > >
>> > > > We probably need to enforce both since they apply to different
>> > > > sets
>> > > > of
>> > > > hardware.
>> > > >
>> > > > > > > Anyway, if you don't like the term "empirical" to refer
>> > > > > > > to
>> > > > > > > all
>> > > > > > > this,
>> > > > > > > that's fine by me, but what I need to know is if we agree
>> > > > > > > that
>> > > > > > > we
>> > > > > > > need
>> > > > > > > to implement the same type promotion rules in the
>> > > > > > > validator
>> > > > > > > that we
>> > > > > > > are
>> > > > > > > implementing in the IR, indepedently of whether refer to
>> > > > > > > them
>> > > > > > > as
>> > > > > > > "based
>> > > > > > > on empirical testing" or not.
>> > > > > > >
>> > > > > >
>> > > > > > I don't think we agree on that. IMHO you want to enforce
>> > > > > > the
>> > > > > > regioning
>> > > > > > restrictions that the hardware spec describes for HF types
>> > > > > > rather
>> > > > > > than
>> > > > > > the current (incorrect and incomplete) generalization of
>> > > > > > the
>> > > > > > standard
>> > > > > > restrictions. With those in place the apparent
>> > > > > > inconsistency
>> > > > > > between
>> > > > > > get_exec_type() and the hardware spec's execution type
>> > > > > > would
>> > > > > > be
>> > > > > > harmless.
>> > > > >
>> > > > > I agree with that. I still feel uneasy about having two
>> > > > > different
>> > > > > implementations of the execution type in FS IR and the
>> > > > > validator,
>> > > > > even
>> > > > > if that doesn't lead to any issues when we do what you say
>> > > > > here...
>> > > > >
>> > > >
>> > > > I feel uneasy about the validator not matching the hardware
>> > > > spec.
>> > > > Luckily its only purpose is validating the rest of the
>> > > > compiler,
>> > > > so
>> > > > if
>> > > > there is an actual inconsistency as you seem to think, it will
>> > > > come
>> > > > up.
>> > > >
>> > > > > At the very least I think that deserves a comment in the
>> > > > > validator
>> > > > > explaining why we think that is okay.
>> > > > >
>> > > >
>> > > > A quote of the hardware spec should be enough of a
>> > > > comment. Right? And
>> > > > there is already a comment in get_exec_type() explaining why
>> > > > its
>> > > > HF
>> > > > type
>> > > > handling logic is consistent with the hardware spec
>> > > > restrictions
>> > > > for
>> > > > HF
>> > > > types.
>> > > >
>> > > > > > > > if anything because currently
>> > > > > > > > it will reject working code that uses HF types.
>> > > > > > >
>> > > > > > > Just for the sake of clarity, since that sentence above
>> > > > > > > could
>> > > > > > > be
>> > > > > > > interpreted as if all HF code would be rejected: we have
>> > > > > > > been
>> > > > > > > using
>> > > > > > > HF
>> > > > > > > types since we landed VK_KHR_16bit_storage, which
>> > > > > > > includes
>> > > > > > > conversions
>> > > > > > > between HF and F and the EU validator is not complaining
>> > > > > > > about
>> > > > > > > any
>> > > > > > > of
>> > > > > > > that. It is currently a problem only with conversions to
>> > > > > > > smaller
>> > > > > > > types
>> > > > > > > (so only conversions to Byte) because that's where we
>> > > > > > > check
>> > > > > > > for
>> > > > > > > that
>> > > > > > > restriction on the stride, which is based on the
>> > > > > > > execution
>> > > > > > > type
>> > > > > > > of
>> > > > > > > the
>> > > > > > > instruction.
>> > > > > > >
>> > > > > > > >
>> > > > > > > > > ---
>> > > > > > > > > src/intel/compiler/brw_eu_validate.c | 27
>> > > > > > > > > ++++++++++++++
>> > > > > > > > > ----
>> > > > > > > > > ----
>> > > > > > > > > -----
>> > > > > > > > > 1 file changed, 14 insertions(+), 13 deletions(-)
>> > > > > > > > >
>> > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > index a25010b225c..3bb37677672 100644
>> > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > > @@ -325,17 +325,20 @@ execution_type(const struct
>> > > > > > > > > gen_device_info
>> > > > > > > > > *devinfo, const brw_inst *inst)
>> > > > > > > > > unsigned num_sources =
>> > > > > > > > > num_sources_from_inst(devinfo,
>> > > > > > > > > inst);
>> > > > > > > > > enum brw_reg_type src0_exec_type, src1_exec_type;
>> > > > > > > > >
>> > > > > > > > > - /* Execution data type is independent of
>> > > > > > > > > destination
>> > > > > > > > > data
>> > > > > > > > > type,
>> > > > > > > > > except in
>> > > > > > > > > - * mixed F/HF instructions on CHV and SKL+.
>> > > > > > > > > + /* Empirical testing suggests that type
>> > > > > > > > > conversions
>> > > > > > > > > involving
>> > > > > > > > > half-float
>> > > > > > > > > + * promote execution type to 32-bit. See
>> > > > > > > > > get_exec_type() in
>> > > > > > > > > brw_ir_fs.h.
>> > > > > > > > > */
>> > > > > > > > > enum brw_reg_type dst_exec_type =
>> > > > > > > > > brw_inst_dst_type(devinfo,
>> > > > > > > > > inst);
>> > > > > > > > >
>> > > > > > > > > src0_exec_type =
>> > > > > > > > > execution_type_for_type(brw_inst_src0_type(devinfo,
>> > > > > > > > > inst));
>> > > > > > > > > if (num_sources == 1) {
>> > > > > > > > > - if ((devinfo->gen >= 9 || devinfo-
>> > > > > > > > > > is_cherryview) &&
>> > > > > > > > >
>> > > > > > > > > - src0_exec_type == BRW_REGISTER_TYPE_HF) {
>> > > > > > > > > - return dst_exec_type;
>> > > > > > > > > + if (type_sz(src0_exec_type) == 2 &&
>> > > > > > > > > dst_exec_type !=
>> > > > > > > > > src0_exec_type) {
>> > > > > > > > > + if (src0_exec_type == BRW_REGISTER_TYPE_HF)
>> > > > > > > > > + return BRW_REGISTER_TYPE_F;
>> > > > > > > > > + else if (dst_exec_type ==
>> > > > > > > > > BRW_REGISTER_TYPE_HF)
>> > > > > > > > > + return BRW_REGISTER_TYPE_D;
>> > > > > > > > > }
>> > > > > > > > > +
>> > > > > > > > > return src0_exec_type;
>> > > > > > > > > }
>> > > > > > > > >
>> > > > > > > > > @@ -367,14 +370,12 @@ execution_type(const struct
>> > > > > > > > > gen_device_info
>> > > > > > > > > *devinfo, const brw_inst *inst)
>> > > > > > > > > src1_exec_type == BRW_REGISTER_TYPE_DF)
>> > > > > > > > > return BRW_REGISTER_TYPE_DF;
>> > > > > > > > >
>> > > > > > > > > - if (devinfo->gen >= 9 || devinfo->is_cherryview)
>> > > > > > > > > {
>> > > > > > > > > - if (dst_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > > > > - src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > > > > - src1_exec_type == BRW_REGISTER_TYPE_F) {
>> > > > > > > > > - return BRW_REGISTER_TYPE_F;
>> > > > > > > > > - } else {
>> > > > > > > > > - return BRW_REGISTER_TYPE_HF;
>> > > > > > > > > - }
>> > > > > > > > > + if (dst_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > > > > + src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > > > > + src1_exec_type == BRW_REGISTER_TYPE_F) {
>> > > > > > > > > + return BRW_REGISTER_TYPE_F;
>> > > > > > > > > + } else {
>> > > > > > > > > + return BRW_REGISTER_TYPE_HF;
>> > > > > > > > > }
>> > > > > > > > >
>> > > > > > > > > assert(src0_exec_type == BRW_REGISTER_TYPE_F);
>> > > > > > > > > --
>> > > > > > > > > 2.17.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190204/1496a2b8/attachment-0001.sig>
More information about the mesa-dev
mailing list