[Beignet] [PATCH 2/3] GBE: add legalize pass to handle wide integers

Fri Oct 10 23:07:22 PDT 2014

I have fixed according to your comment. And will send out new version very soon.
But for the ConstantInt and UndefValue issue, I would like leave it as it is now.
I will improve it later. Just as we talked, it is better if we can process Instruction and ConstantExpr in a unified way.

> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Saturday, October 11, 2014 9:18 AM
> To: Song, Ruiling
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 2/3] GBE: add legalize pass to handle wide
> integers
> 
> Thanks for the patch. It solves a long standing issue. A few comments as
> below:
> 
> On Fri, Oct 10, 2014 at 03:01:26PM +0800, Ruiling Song wrote:
> > This legalize pass will break wider integers like i128/i256/... into shorter
> ones.
> > The problem is how to choose the shorter type? From my observation,
> > wide integer type always comes from shorter ones through 'zext' on
> > small type or 'bitcast' on vectors, so we simply choose the type where it
> comes from.
> > Then we can split wide integer operations into operations on shorter
> interger.
> >
> > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > ---
> > +
> > +  void splitAPInt(APInt &data, SmallVectorImpl<APInt> &result, int
> totalBits, int subBits) {
> > +    APInt lo = data.getLoBits(totalBits/2).trunc(totalBits/2);
> > +    APInt hi = data.getHiBits(totalBits/2).trunc(totalBits/2);
> > +
> > +    if (totalBits/2 <= subBits) {
> > +      result.push_back(lo);
> > +      result.push_back(hi);
> > +      return;
> > +    }
> > +    splitAPInt(lo, result, totalBits/2, subBits);
> > +    splitAPInt(hi, result, totalBits/2, subBits);  }
> > +
> > +  void Legalize::splitLargeInteger(APInt data, Type *splitTy,
> SmallVector<APInt, 16> &split) {
> > +    unsigned opSz = data.getBitWidth();
> > +    unsigned subSz = splitTy->getPrimitiveSizeInBits();
> > +    splitAPInt(data, split, opSz, subSz);  }
> The above algorithm assumes the opSz is an power of 2 bits, right? maybe
> put an assert here to make sure this assumption is better.
> > +
> > +  void Legalize::legalizeICmp(IRBuilder<> &Builder, Instruction *p) {
> > +    ICmpInst *IC = dyn_cast<ICmpInst>(p);
> > +    ICmpInst::Predicate pred = IC->getPredicate();
> > +    // I could not figure out why llvm could generate some
> > +    // compare instruction on large integers. so here only support
> > + equality check
> Agree with your thought here. An assert here is good enough.
> 
> > +    GBE_ASSERT(IC->isEquality());
> > +    Value *op0 = p->getOperand(0);
> > +    Value *op1 = p->getOperand(1);
> > +
> > +    if (isa<ConstantInt>(op0)) {
> > +      op0 = p->getOperand(1);
> > +      op1 = p->getOperand(0);
> > +    }
> Is it possible that both operands are constant? If it is the case, we just need
> to ignore it here.
> 
> > +  void Legalize::legalizeAnd(IRBuilder<> &Builder, Instruction *p) {
> > +    Value *op0 = p->getOperand(0);
> > +    Value *op1 = p->getOperand(1);
> > +
> > +    if ((isa<UndefValue>(op0) || isa<UndefValue>(op1))) {
> > +      // I meet some special case as below:
> > +      //   %82 = zext i32 %81 to i512
> > +      //   %mask148 = and i512 undef, -4294967296
> > +      //   %ins149 = or i512 %mask148, %82
> > +      // I don't how to split this kind of i512 instruction in a good
> > + way,
>                    miss a know here?
> And yes, I also found some similar cases, and some of the root causes are a
> buggy kernel which use uninitialized value. But we still need to compile those
> buggy kernel. My question here is we may need to add this logic to most of
> the other operands handlers? This special case may not be And only case.
> 
> > +
> > +  bool Legalize::legalizeFunction(Function &F) {
> > +    bool changed = false;
> > +    for (Function::iterator bb = F.begin(), bbE = F.end(); bb != bbE; ++bb) {
> > +      IRBuilder<> Builder(bb);
> > +      for (BasicBlock::iterator it = bb->begin(), itE = bb->end(); it != itE;
> ++it) {
> > +        Instruction *insn = it;
> > +        Type *ty = insn->getType();
> > +        if(ty->isIntegerTy() && ty->getIntegerBitWidth() > 64) {
> > +          // result is large integer, push back itself and its users
> > +          changed = true;
> > +
> > +          processed.insert(insn);
> > +
> > +          for(Value::use_iterator iter = insn->use_begin(); iter !=
> insn->use_end(); ++iter) {
> > +            // After LLVM 3.5, use_iterator points to 'Use' instead of
> 'User', which is more straightforward.
> > +          #if (LLVM_VERSION_MAJOR == 3) &&
> (LLVM_VERSION_MINOR < 5)
> > +            User *theUser = *iter;
> > +          #else
> > +            User *theUser = iter->getUser();
> > +          #endif
> > +            processed.insert(theUser);
> > +          }
> > +        }
> > +
> > +        if(processed.empty() || processed.find(insn) == processed.end())
> > +          continue;
> > +
> The above code has an implict assumption that all the wide values should be
> generated before all the uses. Right?
> This assumption may not be correct. Please check the following example:
> 
>     ... (there is no definition of %7)
>       br L2
> 
>     L1:
>       %10 = add i128 %7, %6
>       ...
>       ret
> 
>     L2:
>       %8 = load <4 x i32> addrspace(1)* %3, align 4, !tbaa !1
>       %7 = bitcast <4 x i32> %8 to i128
>       ...
>       br L1
> 
> Thanks,
> Zhigang Gong.