[Mesa-dev] [PATCH v3 04/24] i965/fs: double regioning parameters and execsize for DF in IVB/BYT

Samuel Iglesias Gonsálvez siglesias at igalia.com
Thu Feb 16 06:57:39 UTC 2017


On Wed, 2017-02-15 at 11:41 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
> > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > 
> > In IVB and BYT, both regioning parameters and execution sizes are
> > measured as
> > 32-bits element size.
> > 
> > So when we have something like:
> > 
> > mov(8) g2<1>DF g3<4,4,1>DF
> > 
> > We are not actually moving 8 doubles (our intention), but 4
> > doubles.
> > 
> > We need to double the parameters to cope with this issue. However,
> > horizontal strides don't behave as they're supposed to on IVB
> > for DF regions, they will cause each 32-bit half of DF sources to
> > be
> > strided individually, and doubling the value won't make any
> > difference.
> > 
> > v2:
> > - Use devinfo directly (Matt).
> > - Use Baytrail instead of Valleview (Matt).
> > - Use IvyBridge instead of Ivy (Matt)
> > - Double the exec_size in code emission (Curro)
> > 
> > v3:
> > - Change hstride doubling by an assert and fix commit log (Curro).
> > - Substitute remaining compiler->devinfo by devinfo (Curro).
> > 
> > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51
> > ++++++++++++++++++++++----
> >  1 file changed, 44 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index 26ffbb169d2..b0d5732ac5c 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -54,13 +54,14 @@ brw_file_from_reg(fs_reg *reg)
> >  }
> >  
> >  static struct brw_reg
> > -brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen, bool
> > compressed)
> > +brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst
> > *inst,
> > +                    fs_reg *reg, bool compressed)
> >  {
> >     struct brw_reg brw_reg;
> >  
> >     switch (reg->file) {
> >     case MRF:
> > -      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(gen));
> > +      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(devinfo-
> > >gen));
> >        /* Fallthrough */
> >     case VGRF:
> >        if (reg->stride == 0) {
> > @@ -93,6 +94,35 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg,
> > unsigned gen, bool compressed)
> >           const unsigned width = MIN2(reg_width, phys_width);
> >           brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> > reg->nr, 0);
> >           brw_reg = stride(brw_reg, width * reg->stride, width,
> > reg->stride);
> > +         /* From the IvyBridge PRM (EU Changes by Processor
> > Generation, page 13):
> > +          *  "Each DF (Double Float) operand uses an element size
> > of 4 rather
> > +          *   than 8 and all regioning parameters are twice what
> > the values
> > +          *   would be based on the true element size: ExecSize,
> > Width,
> > +          *   HorzStride, and VertStride. Each DF operand uses a
> > pair of
> > +          *   channels and all masking and swizzing should be
> > adjusted
> > +          *   appropriately."
> > +          *
> > +          * From the IvyBridge PRM (Special Requirements for
> > Handling Double
> > +          * Precision Data Types, page 71):
> > +          *  "In Align1 mode, all regioning parameters like
> > stride, execution
> > +          *   size, and width must use the syntax of a pair of
> > packed
> > +          *   floats. The offsets for these data types must be 64-
> > bit
> > +          *   aligned. The execution size and regioning parameters
> > are in terms
> > +          *   of floats."
> > +          *
> > +          * Summarized: when handling DF-typed arguments,
> > ExecSize,
> > +          * VertStride, and Width must be doubled, and HorzStride
> > must be
> > +          * doubled when the region is not scalar.
> > +          *
> 
> The comment above seems misleading still, doubling the HorzStride
> value
> is almost guaranteed not to give the intended result, regardless of
> whether the region is scalar.  With that clarified patch is:
> 
> Reviewed-by: Francisco Jerez <currojerez at riseup.net>
> 

OK. Thanks!

Sam

> > +          * It applies to BayTrail too.
> > +          */
> > +         if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > +             type_sz(reg->type) == 8) {
> > +            brw_reg.width++;
> > +            if (brw_reg.vstride > 0)
> > +               brw_reg.vstride++;
> > +            assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> > +         }
> >        }
> >  
> >        brw_reg = retype(brw_reg, reg->type);
> > @@ -1586,9 +1616,8 @@ fs_generator::generate_code(const cfg_t *cfg,
> > int dispatch_width)
> >        brw_set_default_group(p, inst->group);
> >  
> >        for (unsigned int i = 0; i < inst->sources; i++) {
> > -         src[i] = brw_reg_from_fs_reg(inst, &inst->src[i],
> > devinfo->gen,
> > -                                      compressed);
> > -
> > +         src[i] = brw_reg_from_fs_reg(devinfo, inst,
> > +                                      &inst->src[i], compressed);
> >  	 /* The accumulator result appears to get used for the
> >  	  * conditional modifier generation.  When negating a UD
> >  	  * value, there is a 33rd bit generated for the sign in
> > the
> > @@ -1599,7 +1628,8 @@ fs_generator::generate_code(const cfg_t *cfg,
> > int dispatch_width)
> >  		inst->src[i].type != BRW_REGISTER_TYPE_UD ||
> >  		!inst->src[i].negate);
> >        }
> > -      dst = brw_reg_from_fs_reg(inst, &inst->dst, devinfo->gen,
> > compressed);
> > +      dst = brw_reg_from_fs_reg(devinfo, inst,
> > +                                &inst->dst, compressed);
> >  
> >        brw_set_default_access_mode(p, BRW_ALIGN_1);
> >        brw_set_default_predicate_control(p, inst->predicate);
> > @@ -1608,7 +1638,14 @@ fs_generator::generate_code(const cfg_t
> > *cfg, int dispatch_width)
> >        brw_set_default_saturate(p, inst->saturate);
> >        brw_set_default_mask_control(p, inst->force_writemask_all);
> >        brw_set_default_acc_write_control(p, inst-
> > >writes_accumulator);
> > -      brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> > +
> > +      unsigned exec_size = inst->exec_size;
> > +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > +          (get_exec_type_size(inst) == 8 || type_sz(inst-
> > >dst.type) == 8)) {
> > +         exec_size *= 2;
> > +      }
> > +
> > +      brw_set_default_exec_size(p, cvt(exec_size) - 1);
> >  
> >        assert(inst->force_writemask_all || inst->exec_size >= 4);
> >        assert(inst->force_writemask_all || inst->group % inst-
> > >exec_size == 0);
> > -- 
> > 2.11.0
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170216/7d2d6f13/attachment-0001.sig>


More information about the mesa-dev mailing list