[Beignet] [PATCH] fix a long relative regreesion issue on BSW caused by local copy propagation
Guo, Yejun
yejun.guo at intel.com
Wed Oct 21 19:39:35 PDT 2015
Ok, I'll add comments for this concern.
I checked nomask and predicate in function SelBasicBlockOptimizer::CanBeReplaced, do you think the logic is enough for the flag register?
-----Original Message-----
From: Yang, Rong R
Sent: Wednesday, October 21, 2015 5:00 PM
To: Guo, Yejun; beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH] fix a long relative regreesion issue on BSW caused by local copy propagation
Yes, if one instruction span several registers, the other registers' visit pattern is same as first register if the vstride is normal(width * hstride).
And I notify you don't take care of flag register, does it safe to ignore the flag register?
Anyway, I will push it first.
> -----Original Message-----
> From: Guo, Yejun
> Sent: Monday, October 19, 2015 20:19
> To: Guo, Yejun; Yang, Rong R; beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] fix a long relative regreesion issue on
> BSW caused by local copy propagation
>
> I have another idea, since we allocate the registers in a regular
> manner, I guess the 32bit element can represent the whole register, it
> just repeats regularly. Is there any case not meet my assumption? If
> no, we can keep the current code but add the comments.
>
> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf
> Of Guo, Yejun
> Sent: Monday, October 19, 2015 4:48 PM
> To: Yang, Rong R; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] fix a long relative regreesion issue on
> BSW caused by local copy propagation
>
> Yes, simd16 + hstrid 4 is not covered, and just get a worse case for
> vstrid 128, 64bit element is not enough too. Will we allocate such
> register? If yes, I need to wrap it in a structure instead of a uint32/uint64.
>
> -----Original Message-----
> From: Yang, Rong R
> Sent: Monday, October 19, 2015 4:38 PM
> To: Guo, Yejun; beignet at lists.freedesktop.org
> Cc: Guo, Yejun
> Subject: RE: [Beignet] [PATCH] fix a long relative regreesion issue on
> BSW caused by local copy propagation
>
> One comment about CalculateElements.
>
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On
> > Behalf Of Guo Yejun
> > Sent: Thursday, October 15, 2015 9:48
> > To: beignet at lists.freedesktop.org
> > Cc: Guo, Yejun
> > Subject: [Beignet] [PATCH] fix a long relative regreesion issue on
> > BSW caused by local copy propagation
> >
> > on bsw, for long data type, src and dst hstride must be aligned to
> > the same qword, so disable the copy propagation for following IRs.
> > MOV(8) %62<2>:UD : %34<8,8,1>:UD
> > MOV(8) %42<1>:L : %62<8,4,2>:UD
> >
> > and also fix a typo issue in function CalculateElements
> >
> > Signed-off-by: Guo Yejun <yejun.guo at intel.com>
> > ---
> > backend/src/backend/gen_insn_selection.cpp | 2 +-
> > backend/src/backend/gen_insn_selection.hpp | 2 ++
> > backend/src/backend/gen_insn_selection_optimize.cpp | 8 +++++++-
> > 3 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/backend/src/backend/gen_insn_selection.cpp
> > b/backend/src/backend/gen_insn_selection.cpp
> > index 9714e3e..5125eb3 100644
> > --- a/backend/src/backend/gen_insn_selection.cpp
> > +++ b/backend/src/backend/gen_insn_selection.cpp
> > @@ -2140,7 +2140,7 @@ namespace gbe
> > this->opaque->setLongRegRestrict(true);
> > this->opaque->setSlowByteGather(true);
> > this->opaque->setHasHalfType(true);
> > - opt_features = SIOF_OP_AND_LOGICAL_SRCMOD;
> > + opt_features = SIOF_OP_AND_LOGICAL_SRCMOD |
> > + SIOF_OP_MOV_LONG_REG_RESTRICT;
> > }
> >
> > Selection9::Selection9(GenContext &ctx) : Selection(ctx) { diff
> > --git a/backend/src/backend/gen_insn_selection.hpp
> > b/backend/src/backend/gen_insn_selection.hpp
> > index f313786..4efb80b 100644
> > --- a/backend/src/backend/gen_insn_selection.hpp
> > +++ b/backend/src/backend/gen_insn_selection.hpp
> > @@ -230,6 +230,8 @@ namespace gbe
> > //for OP_AND, on BDW+, SrcMod value indicates a logical source
> modifier
> > // on PRE-BDW, SrcMod value indicates a numeric source modifier
> > SIOF_OP_AND_LOGICAL_SRCMOD = 1 << 0,
> > + //for OP_MOV, on BSW, for long data type, src and dst hstride
> > + must be
> > aligned to the same qword
> > + SIOF_OP_MOV_LONG_REG_RESTRICT = 1 << 1,
> > };
> >
> > /*! Owns the selection engine */
> > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > index 939dd18..fffc8b0 100644
> > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > @@ -30,7 +30,7 @@ namespace gbe
> > elements |= (1 << offsetInType);
> Is 32bits for elements enough? For example simd16 and hstride is 4.
>
> > offsetInByte += hstride * elementSize;
> > }
> > - offsetInByte += vstride * elementSize;
> > + base += vstride * elementSize;
> > }
> > return elements;
> > }
> > @@ -174,6 +174,12 @@ namespace gbe
> > if (insn.opcode == SEL_OP_AND && (info->replacement.absolute
> > ||
> > info-
> > >replacement.negation))
> > return false;
> >
> > + if (features & SIOF_OP_MOV_LONG_REG_RESTRICT && insn.opcode ==
> > SEL_OP_MOV) {
> > + const GenRegister& dst = insn.dst(0);
> > + if (dst.isint64() && !info->replacement.isint64() &&
> > + info->elements !=
> > CalculateElements(info->replacement, insn.state.execWidth))
> > + return false;
> > + }
> > +
> > if (info->replacementOverwritten)
> > return false;
> >
> > --
> > 1.9.1
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list