[Beignet] [PATCH] GBE: Place loop exits after loop blocks when sorting basic blocks.
Song, Ruiling
ruiling.song at intel.com
Tue Nov 25 21:48:58 PST 2014
I agree,adding an assert childNo <= 2 is good. Normally childNo > 2 only appears in switch-case, which should have been lowered to branch.
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Wednesday, November 26, 2014 12:45 PM
> To: Yang, Rong R
> Cc: Song, Ruiling; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] GBE: Place loop exits after loop blocks when
> sorting basic blocks.
>
> Right, A normal instruction should have no more than 2 successors.
> A ">=2" is a little bit confusing. I will simply remove it and add an assertion
> after get the child number to make sure the childNo is less or equal to 2 if no
> objection.
>
> On Wed, Nov 26, 2014 at 02:57:01AM +0000, Yang, Rong R wrote:
> > Implement the post-order liked traversal, LGTM.
> > Because you suppose always less equal than 2 successors, I think it's better
> to add a childNo assert.
> >
> > > -----Original Message-----
> > > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On
> > > Behalf Of Ruiling Song
> > > Sent: Tuesday, November 25, 2014 13:58
> > > To: beignet at lists.freedesktop.org
> > > Cc: Song, Ruiling
> > > Subject: [Beignet] [PATCH] GBE: Place loop exits after loop blocks
> > > when sorting basic blocks.
> > >
> > > This again is to solve register liveness issue. Details see comment inline.
> > >
> > > This could fix opencv failure under strict conformance mode:
> > > ./opencv_test_core --
> > > gtest_filter=OCL_Arithm/PolarToCart.angleInRadians/0
> > >
> > > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > > ---
> > > backend/src/llvm/llvm_gen_backend.cpp | 92
> > > +++++++++++++++++++++++++++++----
> > > 1 file changed, 82 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> > > b/backend/src/llvm/llvm_gen_backend.cpp
> > > index 558491f..939dd63 100644
> > > --- a/backend/src/llvm/llvm_gen_backend.cpp
> > > +++ b/backend/src/llvm/llvm_gen_backend.cpp
> > > @@ -1941,6 +1941,25 @@ namespace gbe
> > > fn.addLoop(loopBBs, loopExits);
> > > }
> > > }
> > > +
> > > +
> > > + static unsigned getChildNo(BasicBlock *bb) {
> > > + TerminatorInst *term = bb->getTerminator();
> > > + return term->getNumSuccessors(); }
> > > +
> > > + // return NULL if index out-range of children number static
> > > + BasicBlock *getChildPossible(BasicBlock *bb, unsigned index) {
> > > +
> > > + TerminatorInst *term = bb->getTerminator();
> > > + unsigned childNo = term->getNumSuccessors();
> > > + BasicBlock *child = NULL;
> > > + if(index < childNo) {
> > > + child = term->getSuccessor(index);
> > > + }
> > > + return child;
> > > + }
> > > +
> > > /*!
> > >
> > > Sorting Basic blocks is mainly used to solve register liveness
> > > issue, take a @@ -1958,7 +1977,7 @@ namespace gbe
> > > |
> > > -->7
> > >
> > > - A register %10 defined in bb4, and used in bb5 & bb6. In normal
> > > liveness
> > > + 1.) A register %10 defined in bb4, and used in bb5 & bb6. In
> > > + normal liveness
> > > analysis, %10 is not alive in bb3. But under simd execution model,
> after
> > > executing bb4, some channel jump through bb5 to bb3, other
> > > channel may jump
> > > to bb6, we must execute bb3 first, then bb6, to avoid missing
> instructions.
> > > @@ -1969,25 +1988,78 @@ namespace gbe
> > > we can see the bb3 will be placed after bb5 & bb6. The liveness
> calculation
> > > is just as normal and will be correct.
> > >
> > > - Another advantage of sorting basic blocks is reducing register
> pressure.
> > > + 2.) Another advantage of sorting basic blocks is reducing register
> pressure.
> > > In the above CFG, a register defined in bb3 and used in bb7 will be
> > > alive through 3,4,5,6,7. But in fact it should be only alive in bb3 and
> bb7.
> > > After topological sorting, this kind of register would be only alive in
> bb3
> > > and bb7. Register pressure in 4,5,6 is reduced.
> > > -*/
> > >
> > > + 3.) Classical post-order traversal will automatically choose a
> > > + order for the successors of a basic block, But this order may be
> > > + hard to handle, take a look at below CFG:
> > > +
> > > + 1 <-----
> > > + / |
> > > + 2 --> 4 -
> > > + |
> > > + 3
> > > + |
> > > + 5
> > > + In the post oder traversal, it may be: 5->4->3->2->1, as 4, 3
> > > +does not have
> > > + strict order. This is a serious issue, a value defined in bb3,
> > > +used in bb5
> > > + may be overwritten in bb1. Remember the simd execution model?
> > > +some lanes
> > > + may execute bb4 after other lanes finish bb3, and then jump to
> > > +bb1, but live
> > > + range of the register does not cover bb1. what we done here is
> > > +for a loop
> > > + exit (here bb3), we alwasy make sure it is visited first in the
> > > +post-order
> > > + traversal, for the graph, that means 5->3->4->2->1. Then a
> > > +definition in bb3,
> > > + and used in 5 will not interfere with any other values defined in the
> loop.
> > > +*/
> > > void GenWriter::sortBasicBlock(Function &F) {
> > > - typedef ReversePostOrderTraversal<Function*> RPOTType;
> > > - RPOTType rpot(&F);
> > > - Function::BasicBlockListType &bbList = F.getBasicBlockList();
> > > + BasicBlock &entry = F.getEntryBlock();
> > > + std::vector<BasicBlock *> visitStack;
> > > + std::vector<BasicBlock *> sorted;
> > > + std::set<BasicBlock *> visited;
> > > +
> > > + visitStack.push_back(&entry);
> > > + visited.insert(&entry);
> > > +
> > > + while (!visitStack.empty()) {
> > > + BasicBlock *top = visitStack.back();
> > > + unsigned childNo = getChildNo(top);
> > > +
> > > + BasicBlock *child0 = getChildPossible(top, 0);
> > > + BasicBlock *child1 = getChildPossible(top, 1);
> > > + if(childNo >= 2) {
> > > + Loop *loop = LI->getLoopFor(top);
> > > + // visit loop exit node first, so loop exit block will be placed
> > > + // after blocks in loop in 'reverse post-order' list.
> > > + if (loop && loop->contains(child0) && !loop->contains(child1))
> {
> > > + BasicBlock *tmp = child0; child0 = child1; child1 = tmp;
> > > + }
> > > + }
> > > +
> > > + if (child0 != NULL && visited.find(child0) == visited.end()) {
> > > + visitStack.push_back(child0);
> > > + visited.insert(child0);
> > > + } else if (child1 != NULL && visited.find(child1) == visited.end()) {
> > > + visitStack.push_back(child1);
> > > + visited.insert(child1);
> > > + } else {
> > > + sorted.push_back(visitStack.back());
> > > + visitStack.pop_back();
> > > + }
> > > + }
> > >
> > > - for (RPOTType::rpo_iterator bbI = rpot.begin(), bbE = rpot.end();
> bbI !=
> > > bbE; ++bbI) {
> > > - (*bbI)->removeFromParent();
> > > + Function::BasicBlockListType &bbList = F.getBasicBlockList();
> > > + for (std::vector<BasicBlock *>::iterator iter = sorted.begin();
> > > + iter !=
> > > sorted.end(); ++iter) {
> > > + (*iter)->removeFromParent();
> > > }
> > > - for (RPOTType::rpo_iterator bbI = rpot.begin(), bbE = rpot.end();
> bbI !=
> > > bbE; ++bbI) {
> > > - bbList.push_back(*bbI);
> > > +
> > > + for (std::vector<BasicBlock *>::reverse_iterator iter =
> > > + sorted.rbegin();
> > > iter != sorted.rend(); ++iter) {
> > > + bbList.push_back(*iter);
> > > }
> > > }
> > > +
> > > void GenWriter::emitFunction(Function &F)
> > > {
> > > switch (F.getCallingConv()) {
> > > --
> > > 1.7.10.4
> > >
> > > _______________________________________________
> > > Beignet mailing list
> > > Beignet at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list