[Mesa-dev] [PATCH 1/2] i965/fs: Fix single-precision to double-precision conversions for CHV/BSW

Fri Jun 17 06:41:42 UTC 2016

On Wednesday, June 15, 2016 9:25:44 AM PDT Samuel Iglesias Gonsálvez wrote:
> From: Iago Toral Quiroga <itoral at igalia.com>
> 
> From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
> Register Region Restrictions:
> 
>    "When source or destination is 64b (...), regioning in Align1
>     must follow these rules:
> 
>     1. Source and destination horizontal stride must be aligned to
>        the same qword.
>     (...)"
> 
> Cc: "12.0" <mesa-stable at lists.freedesktop.org>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index b811953..c271e64 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -715,10 +715,37 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>     case nir_op_u2f:
>        if (optimize_extract_to_float(instr, result))
>           return;
> +      inst = bld.MOV(result, op[0]);
> +      inst->saturate = instr->dest.saturate;
> +      break;
>  
>     case nir_op_f2d:
>     case nir_op_i2d:
>     case nir_op_u2d:
> +      /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions:
> +       *
> +       *    "When source or destination is 64b (...), regioning in Align1
> +       *     must follow these rules:
> +       *
> +       *     1. Source and destination horizontal stride must be aligned to
> +       *        the same qword.
> +       *     (...)"
> +       *
> +       * This means that 32-bit to 64-bit conversions need to have the 32-bit
> +       * data elements aligned to 64-bit. This restriction does not apply to
> +       * BDW and later.
> +       */
> +      if (devinfo->is_cherryview) {
> +         fs_reg tmp = bld.vgrf(result.type, 1);
> +         tmp = subscript(tmp, op[0].type, 0);
> +         inst = bld.MOV(tmp, op[0]);
> +         inst->regs_written =
> +            inst->dst.component_size(bld.dispatch_width()) / REG_SIZE;

As we discussed on #intel-gfx...this line isn't necessary.
fs_inst::init() initializes regs_written to:

    DIV_ROUND_UP(dst.component_size(exec_size), REG_SIZE);

where exec_size is initialized to dispatch_width in this case.

So the default calculation for component_size() works out to:

    MAX2(channels * 2 [stride], 1) * type_sz(D) = channels * 2 * 4 = 64

while your new one is:

    MAX2(channels * 1 [stride], 1) * type_sz(DF) = channels * 1 * 8 = 64

So they're equivalent.

With that line removed, this patch is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

> +         inst = bld.MOV(result, tmp);
> +         inst->saturate = instr->dest.saturate;
> +         break;
> +      }
> +      /* fallthrough */
>     case nir_op_d2f:
>     case nir_op_d2i:
>     case nir_op_d2u:
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160616/97009781/attachment.sig>