[Beignet] [PATCH 3/3] add optimization for local copy propagation

Zhigang Gong zhigang.gong at linux.intel.com
Mon Sep 7 18:55:50 PDT 2015


> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> Guo, Yejun
> Sent: Monday, September 7, 2015 8:27 PM
> To: Zhigang Gong; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 3/3] add optimization for local copy propagation
> 
> Yes, there will be penalty for the case in your example. I read several
> documents for local copy propagation, and none mentioned this case. :(
> 
> For the method to iterate new instructions/registers, it requires to add the
> 'new' flag during GenIR to SelectionIR period, since the current implementation
> is large, I think it might not be so good, there is high possibility to miss
> something.
> 
> I prefer for the liveness method, it provides much information which could be
> possibly used in other later optimizations. I'll check if ir::liveness could be
> reused or need to design a new liveness for selection ir.
We already have the liveness information in gen backend stage, please check
The following code in backend/context.cpp
  Context::Context(const ir::Unit &unit, const std::string &name) :
    unit(unit), fn(*unit.getFunction(name)), name(name), liveness(NULL), dag(NULL), useDWLabel(false)
  {
    GBE_ASSERT(unit.getPointerSize() == ir::POINTER_32_BITS);
    this->liveness = GBE_NEW(ir::Liveness, const_cast<ir::Function&>(fn), true);
    this->dag = GBE_NEW(ir::FunctionDAG, *this->liveness);
    // r0 (GEN_REG_SIZE) is always set by the HW and used at the end by EOT
    this->registerAllocator = NULL; //GBE_NEW(RegisterAllocator, GEN_REG_SIZE, 4*KB - GEN_REG_SIZE);
    this->scratchAllocator = NULL; //GBE_NEW(ScratchAllocator, 12*KB);
  }

And use ctx.getLiveIn(bb) and ctx.getLiveOut(bb), you can easily get livein set or liveout set for a specified basic block.
That should be good enough to implement this optimization.


Thanks,
Zhigang Gong.

> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Monday, September 07, 2015 2:47 PM
> To: Guo, Yejun; beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH 3/3] add optimization for local copy propagation
> 
> Right, only the instructions created in instruction selection stage could be
> optimized here.
> No need to iterate all the instructions. And no need to do multiple round check.
> Just one round check, and only need to check those temporary registers(only
> live in this BB). If any register is in live out set, then we do not need to touch it.
> 
> Please take a look at the following example
> (%r1 is in live in set, and %r0 is in liveout set, and the MOV %r0, %r1 is the last
> use of the %r1):
> 
> MOV %r0, %r1
> ...
> ADD %r5, %r0, %r2
> 
> If use this optimization, it will become:
> 
> MOV %r0, %r1
> ...
> ADD %r5, %r1, %r2
> 
> Because %r0 is in liveout set, the MOV instruction is not dead instruction and
> will not be eliminated.
> And the worse situation is the %r1's liveness interval incorrectly extent to the
> ADD instruction.
> 
> In the original code, %r1 is not alive at the ADD instruction, now both %r1
> and %r0 are alive after the MOV instruction. Considering a worst case:
> If there are many such type of MOV, then this optimization will not bring any
> optimization, but will increase register pressure dramatically.
> 
> So my two major suggestions are as below:
> 
> 1. Try to iterate newly created instructions/registers only, this could reduce
> most of the overhead.
> 2. Use liveness information, don't touch any registers in the liveout set, actually,
> if you only iterate those newly created instructions, you may avoid any liveout
> registers naturally.
> 
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf
> > Of Guo, Yejun
> > Sent: Monday, September 7, 2015 2:11 PM
> > To: Zhigang Gong; beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH 3/3] add optimization for local copy
> > propagation
> >
> > It is expected that there will be improvement with the optimization
> > since some instructions are removed.
> >
> > As mentioned in the commit log, this patch itself does not remove any
> > instruction, it modifies some instruction to make the removal possible.
> >
> > GenWriter::removeMOVs() did the work inside each basic block at Gen IR
> > level, the idea can be introduced for Selection IR level, and
> > removeMovs globally might also needed for Selection IR.
> >
> > Even GenWriter::removeMOVs() has done the optimization at Gen IR
> > level, it is also necessary to do the same idea again at Selection IR
> > level, since we introduced some extra instructions during GenIR to
> > SelectionIR and/or other functions.
> >
> > Take the selection IR of utest compiler_saturate_sub_uint8_t as an
> > example (see the instruction fragment in commit log), we can see such
> > optimization at selection IR level is also necessary.
> >
> > -----Original Message-----
> > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > Sent: Monday, September 07, 2015 1:43 PM
> > To: Guo, Yejun; beignet at lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH 3/3] add optimization for local copy
> > propagation
> >
> > Is there any evidence that this optimization could bring actual improvement?
> > I doubt it because it doesn't reduce any instruction.
> >
> > Actually, if the %42 is not in the liveout set of current BB, then the
> > MOV could be removed, the exactly same optimization logic has been
> > implemented in the GEN IR stage, the function is GenWriter::removeMOVs(),
> you can check it out.
> >
> > > -----Original Message-----
> > > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On
> > > Behalf Of Guo Yejun
> > > Sent: Monday, September 7, 2015 5:28 AM
> > > To: beignet at lists.freedesktop.org
> > > Cc: Guo Yejun
> > > Subject: [Beignet] [PATCH 3/3] add optimization for local copy
> > > propagation
> > >
> > > it is done at selection ir level, for instructions like:
> > >     MOV(8)              %42<2>:UB	:	%53<32,8,4>:UB
> > >     ADD(8)              %43<2>:B	:	%40<16,8,2>:B
> > > 	-%42<16,8,2>:B
> > > can be optimized as:
> > >     MOV(8)              %42<2>:UB	:	%53<32,8,4>:UB
> > >     ADD(8)              %43<2>:UB	:	%56<32,8,4>:UB
> > > 	-%53<32,8,4>:UB
> > >
> > > the optimization is done for each basic block, we here can not
> > > remove instruction "MOV %42 ..." since it is possible that %42 could
> > > be used at other place in the same block or even in other blocks. We
> > > need to add another path such as dead code elimination at global level to
> remove it.
> > >
> > > Signed-off-by: Guo Yejun <yejun.guo at intel.com>
> > > ---
> > >  .../src/backend/gen_insn_selection_optimize.cpp    | 145
> > > +++++++++++++++++++++
> > >  1 file changed, 145 insertions(+)
> > >
> > > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > > index c82fbe5..b196a4d 100644
> > > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > > @@ -20,6 +20,40 @@ namespace gbe
> > >      virtual void run() = 0;
> > >      virtual ~SelOptimizer() {}
> > >    protected:
> > > +    //we need info derived from execWdith, but it is not contained
> > > + inside
> > > GenRegister
> > > +    class ExpandedRegister
> > > +    {
> > > +    public:
> > > +      ExpandedRegister(const GenRegister& reg, uint32_t execWidth) :
> > > genreg(reg)
> > > +      {
> > > +        elements = CalculateElements(reg, execWidth);
> > > +      }
> > > +      ~ExpandedRegister() {}
> > > +      static uint32_t CalculateElements(const GenRegister& reg,
> > > + uint32_t
> > > execWidth)
> > > +      {
> > > +        uint32_t elements = 0;
> > > +        uint32_t elementSize = typeSize(reg.type);
> > > +        uint32_t width = GenRegister::width_size(reg);
> > > +        assert(execWidth >= width);
> > > +        uint32_t height = execWidth / width;
> > > +        uint32_t vstride = GenRegister::vstride_size(reg);
> > > +        uint32_t hstride = GenRegister::hstride_size(reg);
> > > +        uint32_t base = reg.subnr;
> > > +        for (uint32_t i = 0; i < height; ++i) {
> > > +          uint32_t offsetInByte = base;
> > > +          for (uint32_t j = 0; j < width; ++j) {
> > > +            uint32_t offsetInType = offsetInByte / elementSize;
> > > +            elements |= (1 << offsetInType);            //right for
> > dest
> > > register?
> > > +            offsetInByte += hstride * elementSize;
> > > +          }
> > > +          offsetInByte += vstride * elementSize;
> > > +        }
> > > +        return elements;
> > > +      }
> > > +      const GenRegister& genreg;
> > > +      uint32_t elements;
> > > +    };
> > > +
> > >      uint32_t features;
> > >    };
> > >
> > > @@ -31,13 +65,124 @@ namespace gbe
> > >      virtual void run();
> > >
> > >    private:
> > > +    // local copy propagation
> > > +    typedef std::map<const ExpandedRegister*, const GenRegister*>
> > > RegisterMap;
> > > +    static bool replaceWithCopytable(const RegisterMap& copytable,
> > > uint32_t execWidth, GenRegister& var);
> > > +    static void addToCopytable(RegisterMap& copytable, uint32_t
> > > execWidth, const GenRegister& src, const GenRegister& dst);
> > > +    static void removeFromCopytable(RegisterMap& copytable, const
> > > GenRegister& var);
> > > +    static void cleanCopytable(RegisterMap& copytable);
> > > +    static void propagateRegister(GenRegister& dst, const
> > > + GenRegister&
> > > src);
> > > +    bool doLocalCopyPropagation();
> > > +
> > >      SelectionBlock &bb;
> > >      static const size_t MaxTries = 1;   //the times for optimization
> > >    };
> > >
> > > +  void SelBasicBlockOptimizer::propagateRegister(GenRegister& dst,
> > > + const GenRegister& src)  {
> > > +    dst.type = src.type;
> > > +    dst.file = src.file;
> > > +    dst.physical = src.physical;
> > > +    dst.subphysical = src.subphysical;
> > > +    dst.value.reg = src.value.reg;
> > > +    dst.vstride = src.vstride;
> > > +    dst.width = src.width;
> > > +    dst.hstride = src.hstride;
> > > +    dst.quarter = src.quarter;
> > > +    dst.nr = src.nr;
> > > +    dst.subnr = src.subnr;
> > > +    dst.address_mode = src.address_mode;
> > > +    dst.a0_subnr = src.a0_subnr;
> > > +    dst.addr_imm = src.addr_imm;
> > > +
> > > +    dst.negation = dst.negation ^ src.negation;
> > > +    dst.absolute = dst.absolute | src.absolute;  }
> > > +
> > > +  void SelBasicBlockOptimizer::cleanCopytable(RegisterMap&
> > > + copytable) {
> > > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos
> > > + !=
> > > copytable.end(); ++pos) {
> > > +      const ExpandedRegister* key = pos->first;
> > > +      delete key;
> > > +    }
> > > +    copytable.clear();
> > > +  }
> > > +
> > > +  void SelBasicBlockOptimizer::removeFromCopytable(RegisterMap&
> > > +copytable, const GenRegister& var)
> > > +  {
> > > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos
> > > +!=
> > > copytable.end(); ) {
> > > +      if (pos->first->genreg.reg() == var.reg() ||
> > > + pos->second->reg() ==
> > > var.reg()) {
> > > +        const ExpandedRegister* key = pos->first;
> > > +        copytable.erase(pos++);
> > > +	delete key;
> > > +      }
> > > +      else
> > > +        ++pos;
> > > +    }
> > > +  }
> > > +
> > > +  void SelBasicBlockOptimizer::addToCopytable(RegisterMap&
> > > + copytable, uint32_t execWidth, const GenRegister& src, const
> GenRegister& dst)  {
> > > +    if (src.type != dst.type || src.file != dst.file)
> > > +      return;
> > > +
> > > +    ExpandedRegister* expanded = new ExpandedRegister(dst,
> > execWidth);
> > > +    copytable[expanded] = &src;
> > > +  }
> > > +
> > > +  bool SelBasicBlockOptimizer::replaceWithCopytable(const
> > > + RegisterMap& copytable, uint32_t execWidth, GenRegister& var)  {
> > > +    bool replaced = false;
> > > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos
> > > + !=
> > > copytable.end(); ++pos) {
> > > +      const ExpandedRegister* key = pos->first;
> > > +      if (key->genreg.reg() == var.reg()) {
> > > +        if (key->genreg.type == var.type) {
> > > +          uint32_t elements =
> > > + ExpandedRegister::CalculateElements(var,
> > > execWidth);
> > > +          if (key->elements == elements) {
> > > +            propagateRegister(var, *(pos->second));
> > > +            replaced = true;
> > > +          }
> > > +        }
> > > +        break;
> > > +      }
> > > +    }
> > > +    return replaced;
> > > +  }
> > > +
> > > +  bool SelBasicBlockOptimizer::doLocalCopyPropagation()
> > > +  {
> > > +    bool optimized = false;
> > > +    RegisterMap copytable;
> > > +    for (SelectionInstruction &insn : bb.insnList) {
> > > +      uint32_t execWidth = insn.state.execWidth;
> > > +      if (insn.opcode != SEL_OP_BSWAP && !insn.isWrite()) {
> > > +        for (uint8_t i = 0; i < insn.srcNum; ++i) {
> > > +          bool replaced = replaceWithCopytable(copytable,
> > > + execWidth,
> > > insn.src(i));
> > > +          optimized = optimized || replaced;
> > > +        }
> > > +      }
> > > +
> > > +      for (uint8_t i = 0; i < insn.dstNum; ++i)
> > > +        removeFromCopytable(copytable, insn.dst(i));
> > > +
> > > +      if (insn.opcode == SEL_OP_MOV)
> > > +        addToCopytable(copytable, execWidth, insn.src(0), insn.dst(0));
> > > +    }
> > > +    cleanCopytable(copytable);
> > > +    return optimized;
> > > +  }
> > > +
> > >    void SelBasicBlockOptimizer::run()
> > >    {
> > > +    for (size_t i = 0; i < MaxTries; ++i) {
> > > +      bool again = false;
> > > +      again = again || doLocalCopyPropagation();
> > > +
> > > +      //again = again || doOtherLocalOptimization();
> > >
> > > +      if (!again)
> > > +        break;      //break since no optimization found at this round
> > > +    }
> > >    }
> > >
> > >    class SelGlobalOptimizer : public SelOptimizer
> > > --
> > > 1.9.1
> > >
> > > _______________________________________________
> > > Beignet mailing list
> > > Beignet at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/beignet
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
> 
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list