[Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

Samuel Iglesias Gonsálvez siglesias at igalia.com
Thu Feb 16 07:14:15 UTC 2017


On Thu, 2017-02-16 at 08:11 +0100, Samuel Iglesias Gonsálvez wrote:
> On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote:
> > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > 
> > > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > > 
> > > When converting a DF to 32-bit conversions, we set dst stride to
> > > 2,
> > > to fulfill alignment restrictions because the upper Dword of
> > > every
> > > Qword will be written with undefined value.
> > > 
> > > But in IVB/BYT, this is not necessary, as each DF conversion
> > > already
> > > writes 2, the first one the real value, and the second one a 0.
> > > That is, IVB/BYT already set stride = 2 implicitly, so we must
> > > set
> > > it to
> > > 1 explicitly to avoid ending up with stride = 4.
> > > 
> > > v2:
> > > - Fix typo (Matt)
> > > 
> > > v3:
> > > - Fix stride in the destination's brw_reg, don't modity IR
> > > (Curro)
> > > 
> > > Signed-off-by: Juan A. Suarez Romero <jasuarez at igalia.com>
> > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76
> > > +++++++++++++++-----------
> > >  1 file changed, 45 insertions(+), 31 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > index 2f60ddd8706..e0a80d73b70 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
> > >  
> > >  static struct brw_reg
> > >  brw_reg_from_fs_reg(const struct gen_device_info *devinfo,
> > > fs_inst
> > > *inst,
> > > -                    fs_reg *reg, bool compressed)
> > > +                    fs_reg *reg, bool is_dst, bool compressed)
> > >  {
> > >     struct brw_reg brw_reg;
> > >  
> > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct
> > > gen_device_info *devinfo, fs_inst *inst,
> > >           const unsigned width = MIN2(reg_width, phys_width);
> > >           brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg),
> > > reg->nr, 0);
> > >           brw_reg = stride(brw_reg, width * reg->stride, width,
> > > reg->stride);
> > > -         /* From the IvyBridge PRM (EU Changes by Processor
> > > Generation, page 13):
> > > -          *  "Each DF (Double Float) operand uses an element
> > > size
> > > of 4 rather
> > > -          *   than 8 and all regioning parameters are twice what
> > > the values
> > > -          *   would be based on the true element size: ExecSize,
> > > Width,
> > > -          *   HorzStride, and VertStride. Each DF operand uses a
> > > pair of
> > > -          *   channels and all masking and swizzing should be
> > > adjusted
> > > -          *   appropriately."
> > > -          *
> > > -          * From the IvyBridge PRM (Special Requirements for
> > > Handling Double
> > > -          * Precision Data Types, page 71):
> > > -          *  "In Align1 mode, all regioning parameters like
> > > stride, execution
> > > -          *   size, and width must use the syntax of a pair of
> > > packed
> > > -          *   floats. The offsets for these data types must be
> > > 64-
> > > bit
> > > -          *   aligned. The execution size and regioning
> > > parameters
> > > are in terms
> > > -          *   of floats."
> > > -          *
> > > -          * Summarized: when handling DF-typed arguments,
> > > ExecSize,
> > > -          * VertStride, and Width must be doubled, and
> > > HorzStride
> > > must be
> > > -          * doubled when the region is not scalar.
> > > -          *
> > > -          * It applies to BayTrail too.
> > > -          */
> > > -         if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > > -             type_sz(reg->type) == 8) {
> > > -            brw_reg.width++;
> > > -            if (brw_reg.vstride > 0)
> > > -               brw_reg.vstride++;
> > > -            assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> > > +
> > > +         if (devinfo->gen == 7 && !devinfo->is_haswell) {
> > > +            /* From the IvyBridge PRM (EU Changes by Processor
> > > Generation, page 13):
> > > +             *  "Each DF (Double Float) operand uses an element
> > > size of 4 rather
> > > +             *   than 8 and all regioning parameters are twice
> > > what the values
> > > +             *   would be based on the true element size:
> > > ExecSize, Width,
> > > +             *   HorzStride, and VertStride. Each DF operand
> > > uses
> > > a pair of
> > > +             *   channels and all masking and swizzing should be
> > > adjusted
> > > +             *   appropriately."
> > > +             *
> > > +             * From the IvyBridge PRM (Special Requirements for
> > > Handling Double
> > > +             * Precision Data Types, page 71):
> > > +             *  "In Align1 mode, all regioning parameters like
> > > stride, execution
> > > +             *   size, and width must use the syntax of a pair
> > > of
> > > packed
> > > +             *   floats. The offsets for these data types must
> > > be
> > > 64-bit
> > > +             *   aligned. The execution size and regioning
> > > parameters are in terms
> > > +             *   of floats."
> > > +             *
> > > +             * Summarized: when handling DF-typed arguments,
> > > ExecSize,
> > > +             * VertStride, and Width must be doubled, and
> > > HorzStride must be
> > > +             * doubled when the region is not scalar.
> > > +             *
> > > +             * It applies to BayTrail too.
> > > +             */
> > > +            if (type_sz(reg->type) == 8) {
> > > +               brw_reg.width++;
> > > +               if (brw_reg.vstride > 0)
> > > +                  brw_reg.vstride++;
> > > +               assert(brw_reg.hstride ==
> > > BRW_HORIZONTAL_STRIDE_1);
> > > +            }
> > > +
> > > +            /* When converting from DF->F, we set destination's
> > > stride as 2 as an
> > > +             * because each d2f conversion implicitly writes 2
> > > F,
> > > +             * being the first one the converted value. IVB/BYT
> > > already set
> > > +             * stride = 2 implicitly, so we must set it to 1
> > > explicitly to avoid
> > > +             * ending up with stride = 4.
> > > +             */
> > 
> > The sentence starting from "IVB/BYT already set..." seems kind of
> > misleading, because the hardware's behavior is really not the same
> > as
> > a
> > stride=2 region -- The hardware actually writes two F components
> > per
> > SIMD channel, and every other component is filled with
> > garbage.  Stride
> > behaves as usual, it's just that the hardware outputs twice as many
> > components as you'd expect.
> > 
> 
> OK
> 
> > > +            if (is_dst && get_exec_type_size(inst) == 8 &&
> > > +                type_sz(reg->type) < 8) {
> > > +               assert(brw_reg.hstride ==
> > > BRW_HORIZONTAL_STRIDE_2);
> > > +               brw_reg.hstride = BRW_HORIZONTAL_STRIDE_1;
> > 
> > You could change this line to 'brw_reg.hstride--' and relax the
> > assertion for it to handle arbitrary destination strides.
> > 
> > > +            }
> > >           }
> > >        }
> > >  
> > > @@ -1626,7 +1640,7 @@ fs_generator::generate_code(const cfg_t
> > > *cfg,
> > > int dispatch_width)
> > >  
> > >        for (unsigned int i = 0; i < inst->sources; i++) {
> > >           src[i] = brw_reg_from_fs_reg(devinfo, inst,
> > > -                                      &inst->src[i],
> > > compressed);
> > > +                                      &inst->src[i], false,
> > > compressed);
> > 
> > Multiple boolean arguments make your code rather hard to read (what
> > is
> > this false literal referring to again?) and increases the
> > likelihood
> > of
> > mistaking one of the boolean arguments for the other, since the
> > type
> > checker won't be able to notice the difference.  Instead you could
> > drop
> > the argument altogether and replace the is_dst check in
> > brw_reg_from_fs_reg() with a 'reg == &inst->dst' comparison, or
> > take
> > the
> > destination region hstride munging out of brw_reg_from_fs_reg() and
> > do
> > it below instead, right after the 'dst = ...' assignment.
> > 
> 

By the way, Do you grant your R-b to this patch with the aforementioned
changes?

Sam

> Right. I will change it.
> 
> Thanks,
> 
> Sam
> 
> > >  	 /* The accumulator result appears to get used for the
> > >  	  * conditional modifier generation.  When negating a UD
> > >  	  * value, there is a 33rd bit generated for the sign in
> > > the
> > > @@ -1638,7 +1652,7 @@ fs_generator::generate_code(const cfg_t
> > > *cfg,
> > > int dispatch_width)
> > >  		!inst->src[i].negate);
> > >        }
> > >        dst = brw_reg_from_fs_reg(devinfo, inst,
> > > -                                &inst->dst, compressed);
> > > +                                &inst->dst, true, compressed);
> > >  
> > >        brw_set_default_access_mode(p, BRW_ALIGN_1);
> > >        brw_set_default_predicate_control(p, inst->predicate);
> > > -- 
> > > 2.11.0
> > > 
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170216/59576eed/attachment-0001.sig>


More information about the mesa-dev mailing list