[Beignet] [PATCH V3 3/3] Backend: Add hole reuse in reg alloction

Song, Ruiling ruiling.song at intel.com
Tue Nov 8 13:52:15 UTC 2016



> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Tuesday, November 1, 2016 3:17 PM
> To: beignet at lists.freedesktop.org
> Cc: Pan, Xiuli <xiuli.pan at intel.com>
> Subject: [Beignet] [PATCH V3 3/3] Backend: Add hole reuse in reg alloction
> 
> From: Pan Xiuli <xiuli.pan at intel.com>
> 
> We first find regs that have pool in simple linear scale, and save them
> in HoleRegPool, when allocte regs we first try to search fit candidate
> in the pool and choose the most fit one to reuse.
> 
> V2: Refine hole reuse only in one block.
> V3: Refine data structure with less variable, add OCL_REUSE_HOLE_REG to
> control the optimization.
> DEBUG version: You can export DUMPHOLE for easy review.
> 
> Signed-off-by: Pan Xiuli <xiuli.pan at intel.com>


>      /*! Current vector to expire */
>      uint32_t expiringID;
> +    /*! PHI regs that can be reused */

The phi related logic does not exist anymore. So I don't think we still use PHI here.
Please remove phi in this patch.

> +    map<uint32_t, vector<HoleRegTag>> HoleRegPool;
>      INLINE void insertNewReg(const Selection &selection, ir::Register reg,
> uint32_t grfOffset, bool isVector = false);
>      INLINE bool expireReg(ir::Register reg);
>      INLINE bool spillAtInterval(GenRegInterval interval, int size, uint32_t
> alignment);
> @@ -278,14 +282,83 @@ namespace gbe
>      }
>    }
> 
> -  bool GenRegAllocator::Opaque::createGenReg(const Selection &selection,
> const GenRegInterval &interval) {

The IDFitness design is weird. If (a-b) is equal to 1, then there will be a divide by zero case. Divide by zero is undefined. This should be avoided.
Another problem is if (a-b) is equal to 2, the return value is 1.0f. but if a and b are equal, the return value is also 1.0f. this is a little strange.
The fitness is the less distance between a & b, the higher fitness value. Right?

Why not simply
If (a>=b) return 1.0f/(a-b+1.0f);
Else return 2.0f;

> +  INLINE float IDFitess(int a, int b) {
> +    if (a == b) return 1.0f;
> +    else if (a > b) return 1.0f/(a - b -1);
> +    else return 2.0f;
> +  }
> +
> +  INLINE float getPhiRegFitness(const GenRegInterval &interval, HoleRegTag
> &holeRegTag) {
> +    int regstID = interval.minID;
> +    int regendID = interval.maxID;
> +    int holeregstID = holeRegTag.startID;
> +    int holeregendID = holeRegTag.endID;

I noticed when you add the holeregtag, you use below code.
"             holeregtag.startID = useID + 1;
             holeregtag.endID = insnID - 1;"
so if regstID == holeregstID && holeregendID == regendID. The hole can be reused by the interval. Right?

> +    return IDFitess(regstID, holeregstID) + IDFitess(holeregendID, regendID);
> +  }
> +

Seems that all registers will invoke this getRegBlockID(). The complexity will be O(RegNum*BBNum).
I thinks this is a little too higher than expectation.

> +  INLINE int getRegBlockID(const GenRegInterval &interval,const BlockRanges
> &blockRanges) {
> +    int minID = interval.minID;
> +    int maxID = interval.maxID;
> +    int res = -1;
> +    for(size_t i = 0; i < blockRanges.size(); ++i)
> +    {
> +      if (minID >= blockRanges[i].minID && maxID <= blockRanges[i].maxID)
> +      {
> +        res = i;
> +        break;
> +      }
> +    }
> +    return res;
> +  }
> +
> +
> +  BVAR(OCL_REUSE_HOLE_REG, 0);
You can default it to 1 instead of zero.

> +  bool GenRegAllocator::Opaque::createGenReg(const Selection &selection,
> GenRegInterval &interval) {
>      using namespace ir;
>      const ir::Register reg = interval.reg;
> -    if (RA.contains(reg) == true)
> -      return true; // already allocated
>      uint32_t regSize;
>      ir::RegisterFamily family;
>      getRegAttrib(reg, regSize, &family);
> +    // Check if the reg is live only in one block, thus can use the hole
> +    int blockID = getRegBlockID(interval, this->blockRanges);
> +    if ( blockID >= 0 && OCL_REUSE_HOLE_REG) {
> +      uint32_t useID = interval.maxID;
> +      auto holepool = this->HoleRegPool.find(blockID);
> +      // Use block ID as index to get candidate hole reg to reuse
> +      if (holepool != this->HoleRegPool.end()) {
> +        auto &holepoolvec = holepool->second;
> +        HoleRegTag* holeregbest = NULL;
> +        float lastfitness = 0;
> +        for (auto itr = holepoolvec.begin() ; itr != holepoolvec.end(); ++itr) {
> +          if (regSize != itr->regSize)
> +            continue;
> +          float fitness = getPhiRegFitness(interval, *itr);
> +          // reg out of range of holepool reg
> +          if (fitness > 2.0f) continue;
> +          if (fitness > lastfitness) {
> +            lastfitness = fitness;
> +            holeregbest = &*itr;
> +          }
> +          // reg has one prefect fit use this
> +          if (fitness > 1 ) break;
> +        }
> +        // Reuse the hole and update the holeregpool
> +        if (holeregbest) {
> +          auto holereg = holeregbest->reg;
> +          holeregbest->startID = useID + 1;
> +          if (getenv("DUMPHOLE"))
> +            printf("%d %d(%d:%d->%d)has %zu candidate best fitness is %d
> with %f\n", (int)reg, family,blockID, interval.minID, useID, holepoolvec.size(),
> (int)holereg, lastfitness);
> +          if (RA.contains(holereg)) {
> +            uint32_t grfOffset = RA.find(holereg)->second;
> +            RA.insert(std::make_pair(reg, grfOffset));

If you don't add the 'reg' to spillCandidate. You don't need to insert into offsetReg.
offsetReg is only used when finding spill candidates.
> +            offsetReg.insert(std::make_pair(grfOffset, reg));
> +            interval.isreused = true;
> +          }
> +        }
> +      }
> +    }
> +    if (RA.contains(reg) == true)
> +      return true; // already allocated
>      uint32_t grfOffset = allocateReg(interval, regSize, regSize);
>      if (grfOffset == 0) {
>        return false;
> @@ -712,7 +785,7 @@ namespace gbe
>      ctx.errCode = REGISTER_ALLOCATION_FAIL;
>      const uint32_t regNum = ctx.sel->getRegNum();
>      for (uint32_t startID = 0; startID < regNum; ++startID) {
> -      const GenRegInterval &interval = *this->starting[startID];
> +      GenRegInterval &interval = *this->starting[startID];
>        const ir::Register reg = interval.reg;
> 
>        if (interval.maxID == -INT_MAX)
> @@ -830,6 +903,8 @@ namespace gbe
> 
>    INLINE bool GenRegAllocator::Opaque::expireReg(ir::Register reg)
>    {
> +    if (this->intervals[reg].isreused)
> +      return true;
>      auto it = RA.find(reg);
>      if (flagBooleans.contains(reg))
>        return false;
> @@ -1208,10 +1283,12 @@ namespace gbe
>      }
> 
>      // Compute the intervals
> -    int32_t insnID = 0;
> +    int insnID = 0;
> +    int blockID = 0;
>      for (auto &block : *selection.blockList) {
> +      map<int32_t, BlockRegInterval> blockIntervals;
>        int32_t lastID = insnID;
> -      int32_t firstID = insnID;

Why use insnID + 2 here?
> +      int32_t firstID = insnID ? insnID + 2 : insnID;
>        // Update the intervals of each used register. Note that we do not
>        // register allocate R0, so we skip all sub-registers in r0
>        RegIntervalMap *boolsMap = new RegIntervalMap;
> @@ -1219,7 +1296,7 @@ namespace gbe
>          flag0ReservedBlocks.insert(&block);
>        for (auto &insn : block.insnList) {
>          const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum;
> -        insn.ID  = insnID;
> +        insnID = (int)insn.ID;
>          bool is3SrcOp = insn.opcode == SEL_OP_MAD;
>          for (uint32_t srcID = 0; srcID < srcNum; ++srcID) {
>            const GenRegister &selReg = insn.src(srcID);
> @@ -1244,8 +1321,14 @@ namespace gbe
>            if (this->intervals[reg].conflictReg == 0 ||
>                this->intervals[reg].conflictReg > conflictReg)
>            this->intervals[reg].conflictReg = conflictReg;
> -          this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID);
> -          this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID);
> +          int insnsrcID = insnID;
> +          // If instruction is simple, src and dst can resed and they will have
> different IDs
> +          if (insn.isNative())
> +            insnsrcID -= 1;
> +          this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnsrcID);
> +          this->intervals[reg].maxID = std::max(this->intervals[reg].maxID,
> insnsrcID);
> +          BlockRegInterval &blockRegInterval = blockIntervals[reg];
> +          blockRegInterval.useID = insnsrcID;
>          }
>          for (uint32_t dstID = 0; dstID < dstNum; ++dstID) {
>            const GenRegister &selReg = insn.dst(dstID);
> @@ -1261,6 +1344,25 @@ namespace gbe
>            }
>            this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID);
>            this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID);
> +          // Only find hole reg that not in ocl registers
> +          if ( blockIntervals.find(reg) != blockIntervals.end() && reg >=
> ir::ocl::regNum) {
> +            BlockRegInterval &blockRegInterval = blockIntervals[reg];
> +            int useID = blockRegInterval.useID;
> +            // Check if the reg will leave a hole and add it to HoleRegPool
> +            if (useID >= 0 && useID < insnID - 1 && reg >= ir::ocl::regNum) {
> +              if (getenv("DUMPHOLE"))
> +                printf("reg %d is hole in %d: %d => %d)\n",(int)reg, blockID, useID,
> insnID);
The dumped message is unclear. You can change to "reg %d has a liveness hole in block %d, [%d, %d].

Thanks!
Ruiling


More information about the Beignet mailing list