[Beignet] [PATCH V2 3/3] add local copy propagation optimization for each basic block

Guo, Yejun yejun.guo at intel.com
Wed Sep 23 23:51:12 PDT 2015



-----Original Message-----
From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com] 
Sent: Thursday, September 24, 2015 1:12 PM
To: Guo, Yejun
Cc: beignet at lists.freedesktop.org
Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation optimization for each basic block

On Thu, Sep 24, 2015 at 06:05:31AM +0000, Guo, Yejun wrote:
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.gong at linux.intel.com]
> Sent: Thursday, September 24, 2015 12:32 PM
> To: Guo, Yejun
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation 
> optimization for each basic block
> 
> On Thu, Sep 24, 2015 at 02:58:22AM +0000, Guo, Yejun wrote:
> > 
> > > +
> > > +  void SelBasicBlockOptimizer::changeInsideReplaceInfos(const
> > > + SelectionInstruction& insn, GenRegister& var)  {
> > > +    for (ReplaceInfo* info : replaceInfos) {
> > > +      if (info->intermedia.reg() == var.reg()) {
> A new comment here is that the above loop is a little bit too heavy.
> For a large BB which has many MOV instructions, it will iterate too many times for instructions after those MOV instructions. A better way is to change to use a new map type of replaceInfos:
> map <ir::Register, set<ReplaceInfo*>>
> 
> This will be much faster than iterate all infos for each instruction.
> 
> [Yejun] nice, I'll change to map.
> 
> > > +        bool replaceable = false;
> > It's better to add a comment here to describe the cases which can't be replaced and why.
> > 
> > [yejun] actually, I think the code itself explains something, it is much complex to explain every detail in human words. I consider it as a nice-to-have since the basic logic is simple.
> I still think it is not that simple. A case just came into my mind which we can't do replacement is:
> 
> MOV %r0, %r1
> ADD %r1, %r1, 1
> ...
> ADD %r10, %r0, 1
> ...
> 
> I'm afraid that your current code can't deal it correctly, right?
> 
> [yejun] current code does nothing for these instructions. It also relatives to constant, maybe add another optimization path to keep every path clear and simple. Or we can consider current code as a basic, and to extend it when find such optimization is necessary.
If my understanding is correct, current code will replace %r0 to %r1 in the second ADD instruction which breaks the code. This is not an optimize opportunity, but a bug and must be fixed. The root cause is %r1 has been modified after copy its value to another register, then we should not propagate it to the destination register after that modification. For example, for all instruction after ADD %r1, %r1, 1, We could not do the 's/r0/r1'.

[Yejun] the current code does handle such case.
Firstly, for the terms.  
R0 = r1
R3 = r0 + r2
In the first MOV IR, R0 is named as intermedia, and r1 is named as replacement.
In the second IR, r0 is collected into toBeReplacements.

For the following IR:
1) MOV %r0, %r1
2) MOV %r2, %r0
3) ADD %r1, %r1, 1
 ...
4) ADD %r10, %r0, 1

When the 3) IR is scanned, in SelBasicBlockOptimizer::removeFromReplaceInfos, replacementChanged is recorded.

When the 4) IR is scanned, in SelBasicBlockOptimizer::changeInsideReplaceInfos, we'll remove the info with the following code because replacementChanged makes replaceable false.
        if (!replaceable) {
          replaceInfos.erase(info);
          delete info;
        }

If there is no 4) IR with %r0 as source, at the end of scan, in SelBasicBlockOptimizer::cleanReplaceInfos, 1) and 2) IR will be optimized.

> 
> > 
> > > +            uint32_t elements = CalculateElements(var, insn.state.execWidth);
> > > +            if (info->elements == elements) {
> > Why not compare each attribute value, such as vstride_size,hstride_size,.....?
> > 
> > [Yejun] the execWidth is not recorded in the GenRegister, and I once saw instructions such as:
> > mov(1)  dst (vstride_size or hstride_size is 1/2/... or something 
> > else), src
> > mov(16) anotherdst, dst (with stride as 0)
> > 
> > the dst here is the same, but the strides are different, to handle this case, I add the function CalculateElements.
> The example you gave here has two different GenRegisters as GenRegister has vstride and hstride members. Right? My previous comment suggests to compare two GenRegistes' attributs and I can't understand your point here.
> 
> [Yejun] Let's use the following SEL IR as an example, %42 is the same register but with two different stride in the two IRs. 
> MOV(1)              %42<2>:D	:	%41<0,1,0>:D
> MOV(16)           %43<1>:D   	:     	%42<0,1,0>:D
This doesn't address my comment. As the comment suggests to compare the GenRegisters' attribute not the virtual register. You can easily found the above two GenRegisters are different.

[Yejun] My expectation is that %42 should be considered as the same, and to optimize the IR as "mov(16) %43<1>:D, %41<0,1,0>:D". The optimization chance is lost if we compare the width/stride of %42 and fount they are not the same.

> 
> 
> Thanks,
> Zhigang Gong.
> 
> > 
> > 
> > 
> > 
> > The other parts LGTM,
> > 
> > Thanks,
> > Zhigang Gong


More information about the Beignet mailing list