[Beignet] [PATCH 3/3] add optimization for local copy propagation

Guo, Yejun yejun.guo at intel.com
Mon Sep 7 05:27:10 PDT 2015


Yes, there will be penalty for the case in your example. I read several documents for local copy propagation, and none mentioned this case. :(

For the method to iterate new instructions/registers, it requires to add the 'new' flag during GenIR to SelectionIR period, since the current implementation is large, I think it might not be so good, there is high possibility to miss something.

I prefer for the liveness method, it provides much information which could be possibly used in other later optimizations. I'll check if ir::liveness could be reused or need to design a new liveness for selection ir.

-----Original Message-----
From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com] 
Sent: Monday, September 07, 2015 2:47 PM
To: Guo, Yejun; beignet at lists.freedesktop.org
Subject: RE: [Beignet] [PATCH 3/3] add optimization for local copy propagation

Right, only the instructions created in instruction selection stage could be optimized here.
No need to iterate all the instructions. And no need to do multiple round check. Just one round check, and only need to check those temporary registers(only live in this BB). If any register is in live out set, then we do not need to touch it.

Please take a look at the following example
(%r1 is in live in set, and %r0 is in liveout set, and the MOV %r0, %r1 is the last use of the %r1):

MOV %r0, %r1
...
ADD %r5, %r0, %r2

If use this optimization, it will become:

MOV %r0, %r1
...
ADD %r5, %r1, %r2

Because %r0 is in liveout set, the MOV instruction is not dead instruction and will not be eliminated.
And the worse situation is the %r1's liveness interval incorrectly extent to the ADD instruction.

In the original code, %r1 is not alive at the ADD instruction, now both %r1 and %r0 are alive after the MOV instruction. Considering a worst case:
If there are many such type of MOV, then this optimization will not bring any optimization, but will increase register pressure dramatically.

So my two major suggestions are as below:

1. Try to iterate newly created instructions/registers only, this could reduce most of the overhead.
2. Use liveness information, don't touch any registers in the liveout set, actually, if you only iterate those newly created instructions, you may avoid any liveout registers naturally.

> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf 
> Of Guo, Yejun
> Sent: Monday, September 7, 2015 2:11 PM
> To: Zhigang Gong; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 3/3] add optimization for local copy 
> propagation
> 
> It is expected that there will be improvement with the optimization 
> since some instructions are removed.
> 
> As mentioned in the commit log, this patch itself does not remove any 
> instruction, it modifies some instruction to make the removal possible.
> 
> GenWriter::removeMOVs() did the work inside each basic block at Gen IR 
> level, the idea can be introduced for Selection IR level, and 
> removeMovs globally might also needed for Selection IR.
> 
> Even GenWriter::removeMOVs() has done the optimization at Gen IR 
> level, it is also necessary to do the same idea again at Selection IR 
> level, since we introduced some extra instructions during GenIR to 
> SelectionIR and/or other functions.
> 
> Take the selection IR of utest compiler_saturate_sub_uint8_t as an 
> example (see the instruction fragment in commit log), we can see such 
> optimization at selection IR level is also necessary.
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Monday, September 07, 2015 1:43 PM
> To: Guo, Yejun; beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH 3/3] add optimization for local copy 
> propagation
> 
> Is there any evidence that this optimization could bring actual improvement?
> I doubt it because it doesn't reduce any instruction.
> 
> Actually, if the %42 is not in the liveout set of current BB, then the 
> MOV could be removed, the exactly same optimization logic has been 
> implemented in the GEN IR stage, the function is GenWriter::removeMOVs(), you can check it out.
> 
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On 
> > Behalf Of Guo Yejun
> > Sent: Monday, September 7, 2015 5:28 AM
> > To: beignet at lists.freedesktop.org
> > Cc: Guo Yejun
> > Subject: [Beignet] [PATCH 3/3] add optimization for local copy 
> > propagation
> >
> > it is done at selection ir level, for instructions like:
> >     MOV(8)              %42<2>:UB	:	%53<32,8,4>:UB
> >     ADD(8)              %43<2>:B	:	%40<16,8,2>:B
> > 	-%42<16,8,2>:B
> > can be optimized as:
> >     MOV(8)              %42<2>:UB	:	%53<32,8,4>:UB
> >     ADD(8)              %43<2>:UB	:	%56<32,8,4>:UB
> > 	-%53<32,8,4>:UB
> >
> > the optimization is done for each basic block, we here can not 
> > remove instruction "MOV %42 ..." since it is possible that %42 could 
> > be used at other place in the same block or even in other blocks. We 
> > need to add another path such as dead code elimination at global level to remove it.
> >
> > Signed-off-by: Guo Yejun <yejun.guo at intel.com>
> > ---
> >  .../src/backend/gen_insn_selection_optimize.cpp    | 145
> > +++++++++++++++++++++
> >  1 file changed, 145 insertions(+)
> >
> > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > index c82fbe5..b196a4d 100644
> > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > @@ -20,6 +20,40 @@ namespace gbe
> >      virtual void run() = 0;
> >      virtual ~SelOptimizer() {}
> >    protected:
> > +    //we need info derived from execWdith, but it is not contained 
> > + inside
> > GenRegister
> > +    class ExpandedRegister
> > +    {
> > +    public:
> > +      ExpandedRegister(const GenRegister& reg, uint32_t execWidth) :
> > genreg(reg)
> > +      {
> > +        elements = CalculateElements(reg, execWidth);
> > +      }
> > +      ~ExpandedRegister() {}
> > +      static uint32_t CalculateElements(const GenRegister& reg, 
> > + uint32_t
> > execWidth)
> > +      {
> > +        uint32_t elements = 0;
> > +        uint32_t elementSize = typeSize(reg.type);
> > +        uint32_t width = GenRegister::width_size(reg);
> > +        assert(execWidth >= width);
> > +        uint32_t height = execWidth / width;
> > +        uint32_t vstride = GenRegister::vstride_size(reg);
> > +        uint32_t hstride = GenRegister::hstride_size(reg);
> > +        uint32_t base = reg.subnr;
> > +        for (uint32_t i = 0; i < height; ++i) {
> > +          uint32_t offsetInByte = base;
> > +          for (uint32_t j = 0; j < width; ++j) {
> > +            uint32_t offsetInType = offsetInByte / elementSize;
> > +            elements |= (1 << offsetInType);            //right for
> dest
> > register?
> > +            offsetInByte += hstride * elementSize;
> > +          }
> > +          offsetInByte += vstride * elementSize;
> > +        }
> > +        return elements;
> > +      }
> > +      const GenRegister& genreg;
> > +      uint32_t elements;
> > +    };
> > +
> >      uint32_t features;
> >    };
> >
> > @@ -31,13 +65,124 @@ namespace gbe
> >      virtual void run();
> >
> >    private:
> > +    // local copy propagation
> > +    typedef std::map<const ExpandedRegister*, const GenRegister*>
> > RegisterMap;
> > +    static bool replaceWithCopytable(const RegisterMap& copytable,
> > uint32_t execWidth, GenRegister& var);
> > +    static void addToCopytable(RegisterMap& copytable, uint32_t
> > execWidth, const GenRegister& src, const GenRegister& dst);
> > +    static void removeFromCopytable(RegisterMap& copytable, const
> > GenRegister& var);
> > +    static void cleanCopytable(RegisterMap& copytable);
> > +    static void propagateRegister(GenRegister& dst, const 
> > + GenRegister&
> > src);
> > +    bool doLocalCopyPropagation();
> > +
> >      SelectionBlock &bb;
> >      static const size_t MaxTries = 1;   //the times for optimization
> >    };
> >
> > +  void SelBasicBlockOptimizer::propagateRegister(GenRegister& dst, 
> > + const GenRegister& src)  {
> > +    dst.type = src.type;
> > +    dst.file = src.file;
> > +    dst.physical = src.physical;
> > +    dst.subphysical = src.subphysical;
> > +    dst.value.reg = src.value.reg;
> > +    dst.vstride = src.vstride;
> > +    dst.width = src.width;
> > +    dst.hstride = src.hstride;
> > +    dst.quarter = src.quarter;
> > +    dst.nr = src.nr;
> > +    dst.subnr = src.subnr;
> > +    dst.address_mode = src.address_mode;
> > +    dst.a0_subnr = src.a0_subnr;
> > +    dst.addr_imm = src.addr_imm;
> > +
> > +    dst.negation = dst.negation ^ src.negation;
> > +    dst.absolute = dst.absolute | src.absolute;  }
> > +
> > +  void SelBasicBlockOptimizer::cleanCopytable(RegisterMap& 
> > + copytable) {
> > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos 
> > + !=
> > copytable.end(); ++pos) {
> > +      const ExpandedRegister* key = pos->first;
> > +      delete key;
> > +    }
> > +    copytable.clear();
> > +  }
> > +
> > +  void SelBasicBlockOptimizer::removeFromCopytable(RegisterMap&
> > +copytable, const GenRegister& var)
> > +  {
> > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos 
> > +!=
> > copytable.end(); ) {
> > +      if (pos->first->genreg.reg() == var.reg() || 
> > + pos->second->reg() ==
> > var.reg()) {
> > +        const ExpandedRegister* key = pos->first;
> > +        copytable.erase(pos++);
> > +	delete key;
> > +      }
> > +      else
> > +        ++pos;
> > +    }
> > +  }
> > +
> > +  void SelBasicBlockOptimizer::addToCopytable(RegisterMap& 
> > + copytable, uint32_t execWidth, const GenRegister& src, const GenRegister& dst)  {
> > +    if (src.type != dst.type || src.file != dst.file)
> > +      return;
> > +
> > +    ExpandedRegister* expanded = new ExpandedRegister(dst,
> execWidth);
> > +    copytable[expanded] = &src;
> > +  }
> > +
> > +  bool SelBasicBlockOptimizer::replaceWithCopytable(const
> > + RegisterMap& copytable, uint32_t execWidth, GenRegister& var)  {
> > +    bool replaced = false;
> > +    for (RegisterMap::const_iterator pos = copytable.begin(); pos 
> > + !=
> > copytable.end(); ++pos) {
> > +      const ExpandedRegister* key = pos->first;
> > +      if (key->genreg.reg() == var.reg()) {
> > +        if (key->genreg.type == var.type) {
> > +          uint32_t elements =
> > + ExpandedRegister::CalculateElements(var,
> > execWidth);
> > +          if (key->elements == elements) {
> > +            propagateRegister(var, *(pos->second));
> > +            replaced = true;
> > +          }
> > +        }
> > +        break;
> > +      }
> > +    }
> > +    return replaced;
> > +  }
> > +
> > +  bool SelBasicBlockOptimizer::doLocalCopyPropagation()
> > +  {
> > +    bool optimized = false;
> > +    RegisterMap copytable;
> > +    for (SelectionInstruction &insn : bb.insnList) {
> > +      uint32_t execWidth = insn.state.execWidth;
> > +      if (insn.opcode != SEL_OP_BSWAP && !insn.isWrite()) {
> > +        for (uint8_t i = 0; i < insn.srcNum; ++i) {
> > +          bool replaced = replaceWithCopytable(copytable, 
> > + execWidth,
> > insn.src(i));
> > +          optimized = optimized || replaced;
> > +        }
> > +      }
> > +
> > +      for (uint8_t i = 0; i < insn.dstNum; ++i)
> > +        removeFromCopytable(copytable, insn.dst(i));
> > +
> > +      if (insn.opcode == SEL_OP_MOV)
> > +        addToCopytable(copytable, execWidth, insn.src(0), insn.dst(0));
> > +    }
> > +    cleanCopytable(copytable);
> > +    return optimized;
> > +  }
> > +
> >    void SelBasicBlockOptimizer::run()
> >    {
> > +    for (size_t i = 0; i < MaxTries; ++i) {
> > +      bool again = false;
> > +      again = again || doLocalCopyPropagation();
> > +
> > +      //again = again || doOtherLocalOptimization();
> >
> > +      if (!again)
> > +        break;      //break since no optimization found at this round
> > +    }
> >    }
> >
> >    class SelGlobalOptimizer : public SelOptimizer
> > --
> > 1.9.1
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
> 
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet



More information about the Beignet mailing list