[Mesa-dev] [PATCH 05/10] intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

Tue Jan 8 07:19:42 UTC 2019

On Mon, 2019-01-07 at 12:00 -0800, Francisco Jerez wrote:
> Iago Toral <itoral at igalia.com> writes:
> 
> > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
> > > Currently the visitor attempts to enforce the regioning
> > > restrictions
> > > that apply to double-precision instructions on CHV/BXT at NIR-to-
> > > i965
> > > translation time.  It is possible though for the copy propagation
> > > pass
> > > to violate this restriction if a strided move is propagated into
> > > one
> > > of the affected instructions.  I've only reproduced this issue on
> > > a
> > > future platform but it could affect CHV/BXT too under the right
> > > conditions.
> > > 
> > > Cc: mesa-stable at lists.freedesktop.org
> > > ---
> > >  .../compiler/brw_fs_copy_propagation.cpp      | 10 +++++++
> > >  src/intel/compiler/brw_ir_fs.h                | 28
> > > +++++++++++++++++++
> > >  2 files changed, 38 insertions(+)
> > > 
> > > diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > index a8ec1c34630..c23ce1ef426 100644
> > > --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > > @@ -315,6 +315,16 @@ can_take_stride(fs_inst *inst, unsigned arg,
> > > unsigned stride,
> > >     if (stride > 4)
> > >        return false;
> > >  
> > > +   /* Bail if the channels of the source need to be aligned to
> > > the
> > > byte offset
> > > +    * of the corresponding channel of the destination, and the
> > > provided stride
> > > +    * would break this restriction.
> > > +    */
> > > +   if (has_dst_aligned_region_restriction(devinfo, inst) &&
> > > +       !(type_sz(inst->src[arg].type) * stride ==
> > > +           type_sz(inst->dst.type) * inst->dst.stride ||
> > > +         stride == 0))
> > > +      return false;
> > > +
> > >     /* 3-source instructions can only be Align16, which restricts
> > > what strides
> > >      * they can take. They can only take a stride of 1 (the usual
> > > case), or 0
> > >      * with a special "repctrl" bit. But the repctrl bit doesn't
> > > work
> > > for
> > > diff --git a/src/intel/compiler/brw_ir_fs.h
> > > b/src/intel/compiler/brw_ir_fs.h
> > > index 07e7224e0f8..95b069a2e02 100644
> > > --- a/src/intel/compiler/brw_ir_fs.h
> > > +++ b/src/intel/compiler/brw_ir_fs.h
> > > @@ -486,4 +486,32 @@ get_exec_type_size(const fs_inst *inst)
> > >     return type_sz(get_exec_type(inst));
> > >  }
> > >  
> > > +/**
> > > + * Return whether the following regioning restriction applies to
> > > the
> > > specified
> > > + * instruction.  From the Cherryview PRM Vol 7. "Register Region
> > > + * Restrictions":
> > > + *
> > > + * "When source or destination datatype is 64b or operation is
> > > integer DWord
> > > + *  multiply, regioning in Align1 must follow these rules:
> > > + *
> > > + *  1. Source and Destination horizontal stride must be aligned
> > > to
> > > the same qword.
> > > + *  2. Regioning must ensure Src.Vstride = Src.Width *
> > > Src.Hstride.
> > > + *  3. Source and Destination offset must be the same, except
> > > the
> > > case of
> > > + *     scalar source."
> > > + */
> > > +static inline bool
> > > +has_dst_aligned_region_restriction(const gen_device_info
> > > *devinfo,
> > > +                                   const fs_inst *inst)
> > > +{
> > > +   const brw_reg_type exec_type = get_exec_type(inst);
> > > +   const bool is_int_multiply =
> > > !brw_reg_type_is_floating_point(exec_type) &&
> > > +         (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
> > > BRW_OPCODE_MAD);
> > 
> > Should this be extended to include MAC and MACH too?
> > 
> 
> The documentation is unclear, but it doesn't look like that's the
> case
> according to the simulator, because those instructions don't do more
> than a 16x16 or 32x16 bit integer multiply respectively.

Ok, then it should be fine, thanks for looking into it.

> > > +
> > > +   if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> > > +       (type_sz(exec_type) == 4 && is_int_multiply))
> > > +      return devinfo->is_cherryview ||
> > > gen_device_info_is_9lp(devinfo);
> > 
> > How about:
> > 
> > if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo)) {
> >    ...
> > } else {
> >    return false;
> > }
> > 
> > since we only really need to do these checks in those platforms it
> > might make a bit more sense to do it this way.
> > 
> 
> Right now the difference is purely cosmetic, but in the future that
> won't work for the platform this was designed for, I can send you
> more
> details off-list.

That's fine, I was just looking at it from the point of view of the
platforms this is required for at present. If writing it like this
makes more sense to add future gens later on let's keep it as you wrote
it.

> > > +   else
> > > +      return false;
> > > +}
> > > +
> > >  #endif