[Beignet] [PATCH V3 3/3] Backend: Add hole reuse in reg alloction
Song, Ruiling
ruiling.song at intel.com
Tue Nov 8 13:52:15 UTC 2016
> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Tuesday, November 1, 2016 3:17 PM
> To: beignet at lists.freedesktop.org
> Cc: Pan, Xiuli <xiuli.pan at intel.com>
> Subject: [Beignet] [PATCH V3 3/3] Backend: Add hole reuse in reg alloction
>
> From: Pan Xiuli <xiuli.pan at intel.com>
>
> We first find regs that have pool in simple linear scale, and save them
> in HoleRegPool, when allocte regs we first try to search fit candidate
> in the pool and choose the most fit one to reuse.
>
> V2: Refine hole reuse only in one block.
> V3: Refine data structure with less variable, add OCL_REUSE_HOLE_REG to
> control the optimization.
> DEBUG version: You can export DUMPHOLE for easy review.
>
> Signed-off-by: Pan Xiuli <xiuli.pan at intel.com>
> /*! Current vector to expire */
> uint32_t expiringID;
> + /*! PHI regs that can be reused */
The phi related logic does not exist anymore. So I don't think we still use PHI here.
Please remove phi in this patch.
> + map<uint32_t, vector<HoleRegTag>> HoleRegPool;
> INLINE void insertNewReg(const Selection &selection, ir::Register reg,
> uint32_t grfOffset, bool isVector = false);
> INLINE bool expireReg(ir::Register reg);
> INLINE bool spillAtInterval(GenRegInterval interval, int size, uint32_t
> alignment);
> @@ -278,14 +282,83 @@ namespace gbe
> }
> }
>
> - bool GenRegAllocator::Opaque::createGenReg(const Selection &selection,
> const GenRegInterval &interval) {
The IDFitness design is weird. If (a-b) is equal to 1, then there will be a divide by zero case. Divide by zero is undefined. This should be avoided.
Another problem is if (a-b) is equal to 2, the return value is 1.0f. but if a and b are equal, the return value is also 1.0f. this is a little strange.
The fitness is the less distance between a & b, the higher fitness value. Right?
Why not simply
If (a>=b) return 1.0f/(a-b+1.0f);
Else return 2.0f;
> + INLINE float IDFitess(int a, int b) {
> + if (a == b) return 1.0f;
> + else if (a > b) return 1.0f/(a - b -1);
> + else return 2.0f;
> + }
> +
> + INLINE float getPhiRegFitness(const GenRegInterval &interval, HoleRegTag
> &holeRegTag) {
> + int regstID = interval.minID;
> + int regendID = interval.maxID;
> + int holeregstID = holeRegTag.startID;
> + int holeregendID = holeRegTag.endID;
I noticed when you add the holeregtag, you use below code.
" holeregtag.startID = useID + 1;
holeregtag.endID = insnID - 1;"
so if regstID == holeregstID && holeregendID == regendID. The hole can be reused by the interval. Right?
> + return IDFitess(regstID, holeregstID) + IDFitess(holeregendID, regendID);
> + }
> +
Seems that all registers will invoke this getRegBlockID(). The complexity will be O(RegNum*BBNum).
I thinks this is a little too higher than expectation.
> + INLINE int getRegBlockID(const GenRegInterval &interval,const BlockRanges
> &blockRanges) {
> + int minID = interval.minID;
> + int maxID = interval.maxID;
> + int res = -1;
> + for(size_t i = 0; i < blockRanges.size(); ++i)
> + {
> + if (minID >= blockRanges[i].minID && maxID <= blockRanges[i].maxID)
> + {
> + res = i;
> + break;
> + }
> + }
> + return res;
> + }
> +
> +
> + BVAR(OCL_REUSE_HOLE_REG, 0);
You can default it to 1 instead of zero.
> + bool GenRegAllocator::Opaque::createGenReg(const Selection &selection,
> GenRegInterval &interval) {
> using namespace ir;
> const ir::Register reg = interval.reg;
> - if (RA.contains(reg) == true)
> - return true; // already allocated
> uint32_t regSize;
> ir::RegisterFamily family;
> getRegAttrib(reg, regSize, &family);
> + // Check if the reg is live only in one block, thus can use the hole
> + int blockID = getRegBlockID(interval, this->blockRanges);
> + if ( blockID >= 0 && OCL_REUSE_HOLE_REG) {
> + uint32_t useID = interval.maxID;
> + auto holepool = this->HoleRegPool.find(blockID);
> + // Use block ID as index to get candidate hole reg to reuse
> + if (holepool != this->HoleRegPool.end()) {
> + auto &holepoolvec = holepool->second;
> + HoleRegTag* holeregbest = NULL;
> + float lastfitness = 0;
> + for (auto itr = holepoolvec.begin() ; itr != holepoolvec.end(); ++itr) {
> + if (regSize != itr->regSize)
> + continue;
> + float fitness = getPhiRegFitness(interval, *itr);
> + // reg out of range of holepool reg
> + if (fitness > 2.0f) continue;
> + if (fitness > lastfitness) {
> + lastfitness = fitness;
> + holeregbest = &*itr;
> + }
> + // reg has one prefect fit use this
> + if (fitness > 1 ) break;
> + }
> + // Reuse the hole and update the holeregpool
> + if (holeregbest) {
> + auto holereg = holeregbest->reg;
> + holeregbest->startID = useID + 1;
> + if (getenv("DUMPHOLE"))
> + printf("%d %d(%d:%d->%d)has %zu candidate best fitness is %d
> with %f\n", (int)reg, family,blockID, interval.minID, useID, holepoolvec.size(),
> (int)holereg, lastfitness);
> + if (RA.contains(holereg)) {
> + uint32_t grfOffset = RA.find(holereg)->second;
> + RA.insert(std::make_pair(reg, grfOffset));
If you don't add the 'reg' to spillCandidate. You don't need to insert into offsetReg.
offsetReg is only used when finding spill candidates.
> + offsetReg.insert(std::make_pair(grfOffset, reg));
> + interval.isreused = true;
> + }
> + }
> + }
> + }
> + if (RA.contains(reg) == true)
> + return true; // already allocated
> uint32_t grfOffset = allocateReg(interval, regSize, regSize);
> if (grfOffset == 0) {
> return false;
> @@ -712,7 +785,7 @@ namespace gbe
> ctx.errCode = REGISTER_ALLOCATION_FAIL;
> const uint32_t regNum = ctx.sel->getRegNum();
> for (uint32_t startID = 0; startID < regNum; ++startID) {
> - const GenRegInterval &interval = *this->starting[startID];
> + GenRegInterval &interval = *this->starting[startID];
> const ir::Register reg = interval.reg;
>
> if (interval.maxID == -INT_MAX)
> @@ -830,6 +903,8 @@ namespace gbe
>
> INLINE bool GenRegAllocator::Opaque::expireReg(ir::Register reg)
> {
> + if (this->intervals[reg].isreused)
> + return true;
> auto it = RA.find(reg);
> if (flagBooleans.contains(reg))
> return false;
> @@ -1208,10 +1283,12 @@ namespace gbe
> }
>
> // Compute the intervals
> - int32_t insnID = 0;
> + int insnID = 0;
> + int blockID = 0;
> for (auto &block : *selection.blockList) {
> + map<int32_t, BlockRegInterval> blockIntervals;
> int32_t lastID = insnID;
> - int32_t firstID = insnID;
Why use insnID + 2 here?
> + int32_t firstID = insnID ? insnID + 2 : insnID;
> // Update the intervals of each used register. Note that we do not
> // register allocate R0, so we skip all sub-registers in r0
> RegIntervalMap *boolsMap = new RegIntervalMap;
> @@ -1219,7 +1296,7 @@ namespace gbe
> flag0ReservedBlocks.insert(&block);
> for (auto &insn : block.insnList) {
> const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum;
> - insn.ID = insnID;
> + insnID = (int)insn.ID;
> bool is3SrcOp = insn.opcode == SEL_OP_MAD;
> for (uint32_t srcID = 0; srcID < srcNum; ++srcID) {
> const GenRegister &selReg = insn.src(srcID);
> @@ -1244,8 +1321,14 @@ namespace gbe
> if (this->intervals[reg].conflictReg == 0 ||
> this->intervals[reg].conflictReg > conflictReg)
> this->intervals[reg].conflictReg = conflictReg;
> - this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID);
> - this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID);
> + int insnsrcID = insnID;
> + // If instruction is simple, src and dst can resed and they will have
> different IDs
> + if (insn.isNative())
> + insnsrcID -= 1;
> + this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnsrcID);
> + this->intervals[reg].maxID = std::max(this->intervals[reg].maxID,
> insnsrcID);
> + BlockRegInterval &blockRegInterval = blockIntervals[reg];
> + blockRegInterval.useID = insnsrcID;
> }
> for (uint32_t dstID = 0; dstID < dstNum; ++dstID) {
> const GenRegister &selReg = insn.dst(dstID);
> @@ -1261,6 +1344,25 @@ namespace gbe
> }
> this->intervals[reg].minID = std::min(this->intervals[reg].minID, insnID);
> this->intervals[reg].maxID = std::max(this->intervals[reg].maxID, insnID);
> + // Only find hole reg that not in ocl registers
> + if ( blockIntervals.find(reg) != blockIntervals.end() && reg >=
> ir::ocl::regNum) {
> + BlockRegInterval &blockRegInterval = blockIntervals[reg];
> + int useID = blockRegInterval.useID;
> + // Check if the reg will leave a hole and add it to HoleRegPool
> + if (useID >= 0 && useID < insnID - 1 && reg >= ir::ocl::regNum) {
> + if (getenv("DUMPHOLE"))
> + printf("reg %d is hole in %d: %d => %d)\n",(int)reg, blockID, useID,
> insnID);
The dumped message is unclear. You can change to "reg %d has a liveness hole in block %d, [%d, %d].
Thanks!
Ruiling
More information about the Beignet
mailing list