[Beignet] [PATCH 2/3] GBE: add legalize pass to handle wide integers

Zhigang Gong zhigang.gong at linux.intel.com
Fri Oct 10 18:18:28 PDT 2014


Thanks for the patch. It solves a long standing issue. A few comments
as below:

On Fri, Oct 10, 2014 at 03:01:26PM +0800, Ruiling Song wrote:
> This legalize pass will break wider integers like i128/i256/... into shorter ones.
> The problem is how to choose the shorter type? From my observation,
> wide integer type always comes from shorter ones through 'zext' on small type
> or 'bitcast' on vectors, so we simply choose the type where it comes from.
> Then we can split wide integer operations into operations on shorter interger.
> 
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
> +
> +  void splitAPInt(APInt &data, SmallVectorImpl<APInt> &result, int totalBits, int subBits) {
> +    APInt lo = data.getLoBits(totalBits/2).trunc(totalBits/2);
> +    APInt hi = data.getHiBits(totalBits/2).trunc(totalBits/2);
> +
> +    if (totalBits/2 <= subBits) {
> +      result.push_back(lo);
> +      result.push_back(hi);
> +      return;
> +    }
> +    splitAPInt(lo, result, totalBits/2, subBits);
> +    splitAPInt(hi, result, totalBits/2, subBits);
> +  }
> +
> +  void Legalize::splitLargeInteger(APInt data, Type *splitTy, SmallVector<APInt, 16> &split) {
> +    unsigned opSz = data.getBitWidth();
> +    unsigned subSz = splitTy->getPrimitiveSizeInBits();
> +    splitAPInt(data, split, opSz, subSz);
> +  }
The above algorithm assumes the opSz is an power of 2 bits, right? maybe put an assert here to
make sure this assumption is better.
> +
> +  void Legalize::legalizeICmp(IRBuilder<> &Builder, Instruction *p) {
> +    ICmpInst *IC = dyn_cast<ICmpInst>(p);
> +    ICmpInst::Predicate pred = IC->getPredicate();
> +    // I could not figure out why llvm could generate some
> +    // compare instruction on large integers. so here only support equality check
Agree with your thought here. An assert here is good enough.

> +    GBE_ASSERT(IC->isEquality());
> +    Value *op0 = p->getOperand(0);
> +    Value *op1 = p->getOperand(1);
> +
> +    if (isa<ConstantInt>(op0)) {
> +      op0 = p->getOperand(1);
> +      op1 = p->getOperand(0);
> +    }
Is it possible that both operands are constant? If it is the case, we just need to
ignore it here.

> +  void Legalize::legalizeAnd(IRBuilder<> &Builder, Instruction *p) {
> +    Value *op0 = p->getOperand(0);
> +    Value *op1 = p->getOperand(1);
> +
> +    if ((isa<UndefValue>(op0) || isa<UndefValue>(op1))) {
> +      // I meet some special case as below:
> +      //   %82 = zext i32 %81 to i512
> +      //   %mask148 = and i512 undef, -4294967296
> +      //   %ins149 = or i512 %mask148, %82
> +      // I don't how to split this kind of i512 instruction in a good way,
                   miss a know here? 
And yes, I also found some similar cases, and some of the root causes are
a buggy kernel which use uninitialized value. But we still need to compile
those buggy kernel. My question here is we may need to add this logic to most
of the other operands handlers? This special case may not be And only case.

> +
> +  bool Legalize::legalizeFunction(Function &F) {
> +    bool changed = false;
> +    for (Function::iterator bb = F.begin(), bbE = F.end(); bb != bbE; ++bb) {
> +      IRBuilder<> Builder(bb);
> +      for (BasicBlock::iterator it = bb->begin(), itE = bb->end(); it != itE; ++it) {
> +        Instruction *insn = it;
> +        Type *ty = insn->getType();
> +        if(ty->isIntegerTy() && ty->getIntegerBitWidth() > 64) {
> +          // result is large integer, push back itself and its users
> +          changed = true;
> +
> +          processed.insert(insn);
> +
> +          for(Value::use_iterator iter = insn->use_begin(); iter != insn->use_end(); ++iter) {
> +            // After LLVM 3.5, use_iterator points to 'Use' instead of 'User', which is more straightforward.
> +          #if (LLVM_VERSION_MAJOR == 3) && (LLVM_VERSION_MINOR < 5)
> +            User *theUser = *iter;
> +          #else
> +            User *theUser = iter->getUser();
> +          #endif
> +            processed.insert(theUser);
> +          }
> +        }
> +
> +        if(processed.empty() || processed.find(insn) == processed.end())
> +          continue;
> +
The above code has an implict assumption that all the wide values should be generated before
all the uses. Right?
This assumption may not be correct. Please check the following example:

    ... (there is no definition of %7)
      br L2

    L1:
      %10 = add i128 %7, %6
      ...
      ret

    L2:
      %8 = load <4 x i32> addrspace(1)* %3, align 4, !tbaa !1
      %7 = bitcast <4 x i32> %8 to i128
      ...
      br L1

Thanks,
Zhigang Gong.


More information about the Beignet mailing list