[Mesa-dev] [PATCH v3 04/24] i965/fs: double regioning parameters and execsize for DF in IVB/BYT

Francisco Jerez currojerez at riseup.net
Wed Feb 15 19:41:48 UTC 2017


Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>
> In IVB and BYT, both regioning parameters and execution sizes are measured as
> 32-bits element size.
>
> So when we have something like:
>
> mov(8) g2<1>DF g3<4,4,1>DF
>
> We are not actually moving 8 doubles (our intention), but 4 doubles.
>
> We need to double the parameters to cope with this issue. However,
> horizontal strides don't behave as they're supposed to on IVB
> for DF regions, they will cause each 32-bit half of DF sources to be
> strided individually, and doubling the value won't make any difference.
>
> v2:
> - Use devinfo directly (Matt).
> - Use Baytrail instead of Valleview (Matt).
> - Use IvyBridge instead of Ivy (Matt)
> - Double the exec_size in code emission (Curro)
>
> v3:
> - Change hstride doubling by an assert and fix commit log (Curro).
> - Substitute remaining compiler->devinfo by devinfo (Curro).
>
> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51 ++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 26ffbb169d2..b0d5732ac5c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -54,13 +54,14 @@ brw_file_from_reg(fs_reg *reg)
>  }
>  
>  static struct brw_reg
> -brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen, bool compressed)
> +brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
> +                    fs_reg *reg, bool compressed)
>  {
>     struct brw_reg brw_reg;
>  
>     switch (reg->file) {
>     case MRF:
> -      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(gen));
> +      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(devinfo->gen));
>        /* Fallthrough */
>     case VGRF:
>        if (reg->stride == 0) {
> @@ -93,6 +94,35 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen, bool compressed)
>           const unsigned width = MIN2(reg_width, phys_width);
>           brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
>           brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
> +         /* From the IvyBridge PRM (EU Changes by Processor Generation, page 13):
> +          *  "Each DF (Double Float) operand uses an element size of 4 rather
> +          *   than 8 and all regioning parameters are twice what the values
> +          *   would be based on the true element size: ExecSize, Width,
> +          *   HorzStride, and VertStride. Each DF operand uses a pair of
> +          *   channels and all masking and swizzing should be adjusted
> +          *   appropriately."
> +          *
> +          * From the IvyBridge PRM (Special Requirements for Handling Double
> +          * Precision Data Types, page 71):
> +          *  "In Align1 mode, all regioning parameters like stride, execution
> +          *   size, and width must use the syntax of a pair of packed
> +          *   floats. The offsets for these data types must be 64-bit
> +          *   aligned. The execution size and regioning parameters are in terms
> +          *   of floats."
> +          *
> +          * Summarized: when handling DF-typed arguments, ExecSize,
> +          * VertStride, and Width must be doubled, and HorzStride must be
> +          * doubled when the region is not scalar.
> +          *

The comment above seems misleading still, doubling the HorzStride value
is almost guaranteed not to give the intended result, regardless of
whether the region is scalar.  With that clarified patch is:

Reviewed-by: Francisco Jerez <currojerez at riseup.net>

> +          * It applies to BayTrail too.
> +          */
> +         if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +             type_sz(reg->type) == 8) {
> +            brw_reg.width++;
> +            if (brw_reg.vstride > 0)
> +               brw_reg.vstride++;
> +            assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> +         }
>        }
>  
>        brw_reg = retype(brw_reg, reg->type);
> @@ -1586,9 +1616,8 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
>        brw_set_default_group(p, inst->group);
>  
>        for (unsigned int i = 0; i < inst->sources; i++) {
> -         src[i] = brw_reg_from_fs_reg(inst, &inst->src[i], devinfo->gen,
> -                                      compressed);
> -
> +         src[i] = brw_reg_from_fs_reg(devinfo, inst,
> +                                      &inst->src[i], compressed);
>  	 /* The accumulator result appears to get used for the
>  	  * conditional modifier generation.  When negating a UD
>  	  * value, there is a 33rd bit generated for the sign in the
> @@ -1599,7 +1628,8 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
>  		inst->src[i].type != BRW_REGISTER_TYPE_UD ||
>  		!inst->src[i].negate);
>        }
> -      dst = brw_reg_from_fs_reg(inst, &inst->dst, devinfo->gen, compressed);
> +      dst = brw_reg_from_fs_reg(devinfo, inst,
> +                                &inst->dst, compressed);
>  
>        brw_set_default_access_mode(p, BRW_ALIGN_1);
>        brw_set_default_predicate_control(p, inst->predicate);
> @@ -1608,7 +1638,14 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
>        brw_set_default_saturate(p, inst->saturate);
>        brw_set_default_mask_control(p, inst->force_writemask_all);
>        brw_set_default_acc_write_control(p, inst->writes_accumulator);
> -      brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> +
> +      unsigned exec_size = inst->exec_size;
> +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +          (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) == 8)) {
> +         exec_size *= 2;
> +      }
> +
> +      brw_set_default_exec_size(p, cvt(exec_size) - 1);
>  
>        assert(inst->force_writemask_all || inst->exec_size >= 4);
>        assert(inst->force_writemask_all || inst->group % inst->exec_size == 0);
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170215/17ecd4f6/attachment.sig>


More information about the mesa-dev mailing list