[Beignet] [PATCH V2 3/3] add local copy propagation optimization for each basic block

Zhigang Gong zhigang.gong at linux.intel.com
Thu Sep 24 00:04:47 PDT 2015


On Thu, Sep 24, 2015 at 06:51:12AM +0000, Guo, Yejun wrote:
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com] 
> Sent: Thursday, September 24, 2015 1:12 PM
> To: Guo, Yejun
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation optimization for each basic block
> 
> On Thu, Sep 24, 2015 at 06:05:31AM +0000, Guo, Yejun wrote:
> > 
> > 
> > -----Original Message-----
> > From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> > Sent: Thursday, September 24, 2015 12:32 PM
> > To: Guo, Yejun
> > Cc: beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation 
> > optimization for each basic block
> > 
> > On Thu, Sep 24, 2015 at 02:58:22AM +0000, Guo, Yejun wrote:
> > > 
> > > > +
> > > > +  void SelBasicBlockOptimizer::changeInsideReplaceInfos(const
> > > > + SelectionInstruction& insn, GenRegister& var)  {
> > > > +    for (ReplaceInfo* info : replaceInfos) {
> > > > +      if (info->intermedia.reg() == var.reg()) {
> > A new comment here is that the above loop is a little bit too heavy.
> > For a large BB which has many MOV instructions, it will iterate too many times for instructions after those MOV instructions. A better way is to change to use a new map type of replaceInfos:
> > map <ir::Register, set<ReplaceInfo*>>
> > 
> > This will be much faster than iterate all infos for each instruction.
> > 
> > [Yejun] nice, I'll change to map.
> > 
> > > > +        bool replaceable = false;
> > > It's better to add a comment here to describe the cases which can't be replaced and why.
> > > 
> > > [yejun] actually, I think the code itself explains something, it is much complex to explain every detail in human words. I consider it as a nice-to-have since the basic logic is simple.
> > I still think it is not that simple. A case just came into my mind which we can't do replacement is:
> > 
> > MOV %r0, %r1
> > ADD %r1, %r1, 1
> > ...
> > ADD %r10, %r0, 1
> > ...
> > 
> > I'm afraid that your current code can't deal it correctly, right?
> > 
> > [yejun] current code does nothing for these instructions. It also relatives to constant, maybe add another optimization path to keep every path clear and simple. Or we can consider current code as a basic, and to extend it when find such optimization is necessary.
> If my understanding is correct, current code will replace %r0 to %r1 in the second ADD instruction which breaks the code. This is not an optimize opportunity, but a bug and must be fixed. The root cause is %r1 has been modified after copy its value to another register, then we should not propagate it to the destination register after that modification. For example, for all instruction after ADD %r1, %r1, 1, We could not do the 's/r0/r1'.
> 
> [Yejun] the current code does handle such case.
> Firstly, for the terms.  
> R0 = r1
> R3 = r0 + r2
> In the first MOV IR, R0 is named as intermedia, and r1 is named as replacement.
> In the second IR, r0 is collected into toBeReplacements.
> 
> For the following IR:
> 1) MOV %r0, %r1
> 2) MOV %r2, %r0
> 3) ADD %r1, %r1, 1
>  ...
> 4) ADD %r10, %r0, 1
> 
> When the 3) IR is scanned, in SelBasicBlockOptimizer::removeFromReplaceInfos, replacementChanged is recorded.
> 
> When the 4) IR is scanned, in SelBasicBlockOptimizer::changeInsideReplaceInfos, we'll remove the info with the following code because replacementChanged makes replaceable false.
>         if (!replaceable) {
>           replaceInfos.erase(info);
>           delete info;
>         }
> 
> If there is no 4) IR with %r0 as source, at the end of scan, in SelBasicBlockOptimizer::cleanReplaceInfos, 1) and 2) IR will be optimized.

Oops, didn't see you remove registers when they become destination registers.
Then this logic is correct here.

Look into the code again, and found the following check is interesting and not simple at all :)

+        if (insn.opcode != SEL_OP_BSWAP && !insn.isWrite()) {

1. Why BSWAP is special here? The root cause may be that the src of bswap will be modified.
   But should this be handled in the BSWAP instruction already (I mean, both src and
   dst of BSWAP should be put into  src array and dst array ).
   Unfortunately, BSWAP seems have a bug here and may lead to incorrect scheduling latter.
   That should be fixed in BSWAP, then BSWAP check should be removed from here.

2. Why !insn.isWrite() is here?
   This is more difficult than the first one. My understanding is that a register in selection
   vector should not be replaced. Because, that will bring extra complexity to manipulate the
   selection vectors. But the isWrite() checking doesn't cover all the cases. Actually,
   nearly all read instructions have one selection source vector, although many of them have only
   1 element. For example, the SAMPLE instruction has a source selection vector.

> 
> > 
> > > 
> > > > +            uint32_t elements = CalculateElements(var, insn.state.execWidth);
> > > > +            if (info->elements == elements) {
> > > Why not compare each attribute value, such as vstride_size,hstride_size,.....?
> > > 
> > > [Yejun] the execWidth is not recorded in the GenRegister, and I once saw instructions such as:
> > > mov(1)  dst (vstride_size or hstride_size is 1/2/... or something 
> > > else), src
> > > mov(16) anotherdst, dst (with stride as 0)
> > > 
> > > the dst here is the same, but the strides are different, to handle this case, I add the function CalculateElements.
> > The example you gave here has two different GenRegisters as GenRegister has vstride and hstride members. Right? My previous comment suggests to compare two GenRegistes' attributs and I can't understand your point here.
> > 
> > [Yejun] Let's use the following SEL IR as an example, %42 is the same register but with two different stride in the two IRs. 
> > MOV(1)              %42<2>:D	:	%41<0,1,0>:D
> > MOV(16)           %43<1>:D   	:     	%42<0,1,0>:D
> This doesn't address my comment. As the comment suggests to compare the GenRegisters' attribute not the virtual register. You can easily found the above two GenRegisters are different.
> 
> [Yejun] My expectation is that %42 should be considered as the same, and to optimize the IR as "mov(16) %43<1>:D, %41<0,1,0>:D". The optimization chance is lost if we compare the width/stride of %42 and fount they are not the same.

The clear logic here is that the assignment at intermedia instruction should
cover the current replacement's usage. So the clear way to check that is to
implement the following function:

bool IsSrcCoveredByIntermedia(GenRegister src, int curExecWidth, GenRegister intermedia, int intermediaExecWidth)
{
}

Before we do the above check, another thing we need to take into consideration which is the predication and mask
of the current instruction and the intermedia instruction.

intermedia                src                       replacable
noMask = 1               all                        yes
GEN_PREDICATE_NORMAL     GEN_PREDICATE_NORMAL       yes
GEN_PREDICATE_NORMAL     !GEN_PREDICATE_NORMAL      yes
!GEN_PREDICATE_NORMAL    all                        no (could check pred flag further, but not worth to do)

So we may change the above function to:
bool IsSrcCoveredByIntermedia(GenRegister src, SelectionInstruction *curInsn, GenRegister intermedia, SelectionInstruction *intermediaInsn)
{
}

Do all the check in this function and give a result whether it's replacable.

BTW, as a reviewer, I really love more clear comment which could reduce tons of time to
get to the point :).

Thanks,
Zhigang Gong.

> 
> > 
> > 
> > Thanks,
> > Zhigang Gong.
> > 
> > > 
> > > 
> > > 
> > > 
> > > The other parts LGTM,
> > > 
> > > Thanks,
> > > Zhigang Gong


More information about the Beignet mailing list