[Mesa-dev] [PATCH v3 08/24] i965/fs: fix dst stride in IVB/BYT type conversions

Francisco Jerez currojerez at riseup.net
Wed Feb 15 20:08:01 UTC 2017


Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>
> When converting a DF to 32-bit conversions, we set dst stride to 2,
> to fulfill alignment restrictions because the upper Dword of every
> Qword will be written with undefined value.
>
> But in IVB/BYT, this is not necessary, as each DF conversion already
> writes 2, the first one the real value, and the second one a 0.
> That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
> 1 explicitly to avoid ending up with stride = 4.
>
> v2:
> - Fix typo (Matt)
>
> v3:
> - Fix stride in the destination's brw_reg, don't modity IR (Curro)
>
> Signed-off-by: Juan A. Suarez Romero <jasuarez at igalia.com>
> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 +++++++++++++++-----------
>  1 file changed, 45 insertions(+), 31 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 2f60ddd8706..e0a80d73b70 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg)
>  
>  static struct brw_reg
>  brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
> -                    fs_reg *reg, bool compressed)
> +                    fs_reg *reg, bool is_dst, bool compressed)
>  {
>     struct brw_reg brw_reg;
>  
> @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
>           const unsigned width = MIN2(reg_width, phys_width);
>           brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
>           brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
> -         /* From the IvyBridge PRM (EU Changes by Processor Generation, page 13):
> -          *  "Each DF (Double Float) operand uses an element size of 4 rather
> -          *   than 8 and all regioning parameters are twice what the values
> -          *   would be based on the true element size: ExecSize, Width,
> -          *   HorzStride, and VertStride. Each DF operand uses a pair of
> -          *   channels and all masking and swizzing should be adjusted
> -          *   appropriately."
> -          *
> -          * From the IvyBridge PRM (Special Requirements for Handling Double
> -          * Precision Data Types, page 71):
> -          *  "In Align1 mode, all regioning parameters like stride, execution
> -          *   size, and width must use the syntax of a pair of packed
> -          *   floats. The offsets for these data types must be 64-bit
> -          *   aligned. The execution size and regioning parameters are in terms
> -          *   of floats."
> -          *
> -          * Summarized: when handling DF-typed arguments, ExecSize,
> -          * VertStride, and Width must be doubled, and HorzStride must be
> -          * doubled when the region is not scalar.
> -          *
> -          * It applies to BayTrail too.
> -          */
> -         if (devinfo->gen == 7 && !devinfo->is_haswell &&
> -             type_sz(reg->type) == 8) {
> -            brw_reg.width++;
> -            if (brw_reg.vstride > 0)
> -               brw_reg.vstride++;
> -            assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> +
> +         if (devinfo->gen == 7 && !devinfo->is_haswell) {
> +            /* From the IvyBridge PRM (EU Changes by Processor Generation, page 13):
> +             *  "Each DF (Double Float) operand uses an element size of 4 rather
> +             *   than 8 and all regioning parameters are twice what the values
> +             *   would be based on the true element size: ExecSize, Width,
> +             *   HorzStride, and VertStride. Each DF operand uses a pair of
> +             *   channels and all masking and swizzing should be adjusted
> +             *   appropriately."
> +             *
> +             * From the IvyBridge PRM (Special Requirements for Handling Double
> +             * Precision Data Types, page 71):
> +             *  "In Align1 mode, all regioning parameters like stride, execution
> +             *   size, and width must use the syntax of a pair of packed
> +             *   floats. The offsets for these data types must be 64-bit
> +             *   aligned. The execution size and regioning parameters are in terms
> +             *   of floats."
> +             *
> +             * Summarized: when handling DF-typed arguments, ExecSize,
> +             * VertStride, and Width must be doubled, and HorzStride must be
> +             * doubled when the region is not scalar.
> +             *
> +             * It applies to BayTrail too.
> +             */
> +            if (type_sz(reg->type) == 8) {
> +               brw_reg.width++;
> +               if (brw_reg.vstride > 0)
> +                  brw_reg.vstride++;
> +               assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> +            }
> +
> +            /* When converting from DF->F, we set destination's stride as 2 as an
> +             * because each d2f conversion implicitly writes 2 F,
> +             * being the first one the converted value. IVB/BYT already set
> +             * stride = 2 implicitly, so we must set it to 1 explicitly to avoid
> +             * ending up with stride = 4.
> +             */

The sentence starting from "IVB/BYT already set..." seems kind of
misleading, because the hardware's behavior is really not the same as a
stride=2 region -- The hardware actually writes two F components per
SIMD channel, and every other component is filled with garbage.  Stride
behaves as usual, it's just that the hardware outputs twice as many
components as you'd expect.

> +            if (is_dst && get_exec_type_size(inst) == 8 &&
> +                type_sz(reg->type) < 8) {
> +               assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_2);
> +               brw_reg.hstride = BRW_HORIZONTAL_STRIDE_1;

You could change this line to 'brw_reg.hstride--' and relax the
assertion for it to handle arbitrary destination strides.

> +            }
>           }
>        }
>  
> @@ -1626,7 +1640,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
>  
>        for (unsigned int i = 0; i < inst->sources; i++) {
>           src[i] = brw_reg_from_fs_reg(devinfo, inst,
> -                                      &inst->src[i], compressed);
> +                                      &inst->src[i], false, compressed);

Multiple boolean arguments make your code rather hard to read (what is
this false literal referring to again?) and increases the likelihood of
mistaking one of the boolean arguments for the other, since the type
checker won't be able to notice the difference.  Instead you could drop
the argument altogether and replace the is_dst check in
brw_reg_from_fs_reg() with a 'reg == &inst->dst' comparison, or take the
destination region hstride munging out of brw_reg_from_fs_reg() and do
it below instead, right after the 'dst = ...' assignment.

>  	 /* The accumulator result appears to get used for the
>  	  * conditional modifier generation.  When negating a UD
>  	  * value, there is a 33rd bit generated for the sign in the
> @@ -1638,7 +1652,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
>  		!inst->src[i].negate);
>        }
>        dst = brw_reg_from_fs_reg(devinfo, inst,
> -                                &inst->dst, compressed);
> +                                &inst->dst, true, compressed);
>  
>        brw_set_default_access_mode(p, BRW_ALIGN_1);
>        brw_set_default_predicate_control(p, inst->predicate);
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170215/af15879c/attachment.sig>


More information about the mesa-dev mailing list