[Beignet] [PATCH 2/3] GBE: add legalize pass to handle wide integers
Song, Ruiling
ruiling.song at intel.com
Fri Oct 10 23:07:22 PDT 2014
I have fixed according to your comment. And will send out new version very soon.
But for the ConstantInt and UndefValue issue, I would like leave it as it is now.
I will improve it later. Just as we talked, it is better if we can process Instruction and ConstantExpr in a unified way.
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Saturday, October 11, 2014 9:18 AM
> To: Song, Ruiling
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 2/3] GBE: add legalize pass to handle wide
> integers
>
> Thanks for the patch. It solves a long standing issue. A few comments as
> below:
>
> On Fri, Oct 10, 2014 at 03:01:26PM +0800, Ruiling Song wrote:
> > This legalize pass will break wider integers like i128/i256/... into shorter
> ones.
> > The problem is how to choose the shorter type? From my observation,
> > wide integer type always comes from shorter ones through 'zext' on
> > small type or 'bitcast' on vectors, so we simply choose the type where it
> comes from.
> > Then we can split wide integer operations into operations on shorter
> interger.
> >
> > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > ---
> > +
> > + void splitAPInt(APInt &data, SmallVectorImpl<APInt> &result, int
> totalBits, int subBits) {
> > + APInt lo = data.getLoBits(totalBits/2).trunc(totalBits/2);
> > + APInt hi = data.getHiBits(totalBits/2).trunc(totalBits/2);
> > +
> > + if (totalBits/2 <= subBits) {
> > + result.push_back(lo);
> > + result.push_back(hi);
> > + return;
> > + }
> > + splitAPInt(lo, result, totalBits/2, subBits);
> > + splitAPInt(hi, result, totalBits/2, subBits); }
> > +
> > + void Legalize::splitLargeInteger(APInt data, Type *splitTy,
> SmallVector<APInt, 16> &split) {
> > + unsigned opSz = data.getBitWidth();
> > + unsigned subSz = splitTy->getPrimitiveSizeInBits();
> > + splitAPInt(data, split, opSz, subSz); }
> The above algorithm assumes the opSz is an power of 2 bits, right? maybe
> put an assert here to make sure this assumption is better.
> > +
> > + void Legalize::legalizeICmp(IRBuilder<> &Builder, Instruction *p) {
> > + ICmpInst *IC = dyn_cast<ICmpInst>(p);
> > + ICmpInst::Predicate pred = IC->getPredicate();
> > + // I could not figure out why llvm could generate some
> > + // compare instruction on large integers. so here only support
> > + equality check
> Agree with your thought here. An assert here is good enough.
>
> > + GBE_ASSERT(IC->isEquality());
> > + Value *op0 = p->getOperand(0);
> > + Value *op1 = p->getOperand(1);
> > +
> > + if (isa<ConstantInt>(op0)) {
> > + op0 = p->getOperand(1);
> > + op1 = p->getOperand(0);
> > + }
> Is it possible that both operands are constant? If it is the case, we just need
> to ignore it here.
>
> > + void Legalize::legalizeAnd(IRBuilder<> &Builder, Instruction *p) {
> > + Value *op0 = p->getOperand(0);
> > + Value *op1 = p->getOperand(1);
> > +
> > + if ((isa<UndefValue>(op0) || isa<UndefValue>(op1))) {
> > + // I meet some special case as below:
> > + // %82 = zext i32 %81 to i512
> > + // %mask148 = and i512 undef, -4294967296
> > + // %ins149 = or i512 %mask148, %82
> > + // I don't how to split this kind of i512 instruction in a good
> > + way,
> miss a know here?
> And yes, I also found some similar cases, and some of the root causes are a
> buggy kernel which use uninitialized value. But we still need to compile those
> buggy kernel. My question here is we may need to add this logic to most of
> the other operands handlers? This special case may not be And only case.
>
> > +
> > + bool Legalize::legalizeFunction(Function &F) {
> > + bool changed = false;
> > + for (Function::iterator bb = F.begin(), bbE = F.end(); bb != bbE; ++bb) {
> > + IRBuilder<> Builder(bb);
> > + for (BasicBlock::iterator it = bb->begin(), itE = bb->end(); it != itE;
> ++it) {
> > + Instruction *insn = it;
> > + Type *ty = insn->getType();
> > + if(ty->isIntegerTy() && ty->getIntegerBitWidth() > 64) {
> > + // result is large integer, push back itself and its users
> > + changed = true;
> > +
> > + processed.insert(insn);
> > +
> > + for(Value::use_iterator iter = insn->use_begin(); iter !=
> insn->use_end(); ++iter) {
> > + // After LLVM 3.5, use_iterator points to 'Use' instead of
> 'User', which is more straightforward.
> > + #if (LLVM_VERSION_MAJOR == 3) &&
> (LLVM_VERSION_MINOR < 5)
> > + User *theUser = *iter;
> > + #else
> > + User *theUser = iter->getUser();
> > + #endif
> > + processed.insert(theUser);
> > + }
> > + }
> > +
> > + if(processed.empty() || processed.find(insn) == processed.end())
> > + continue;
> > +
> The above code has an implict assumption that all the wide values should be
> generated before all the uses. Right?
> This assumption may not be correct. Please check the following example:
>
> ... (there is no definition of %7)
> br L2
>
> L1:
> %10 = add i128 %7, %6
> ...
> ret
>
> L2:
> %8 = load <4 x i32> addrspace(1)* %3, align 4, !tbaa !1
> %7 = bitcast <4 x i32> %8 to i128
> ...
> br L1
>
> Thanks,
> Zhigang Gong.
More information about the Beignet
mailing list