[Mesa-dev] [PATCH 08/22] i965/fs: fix lower SIMD width for IVB/VLV's MOVE_INDIRECT

Francisco Jerez currojerez at riseup.net
Fri Jan 13 00:45:35 UTC 2017


Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>
> Previous to Broadwell, we have 8 registers for MOV_INDIRECT. But if
> IVB/VLV deal with DFs, we will duplicate the exec_size from 8 to 16.
>
> This patch limits the SIMD width to 4 in this case.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index cfce364..45d320d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4959,8 +4959,13 @@ get_lowered_simd_width(const struct gen_device_info *devinfo,
>        return MIN2(8, inst->exec_size);
>  
>     case SHADER_OPCODE_MOV_INDIRECT:
> -      /* Prior to Broadwell, we only have 8 address subregisters */
> -      return MIN3(devinfo->gen >= 8 ? 16 : 8,
> +      /* Prior to Broadwell, we only have 8 address subregisters. Special case
> +       * for IVB/VLV and DF types: set to 4 (exec_size will be later
> +       * duplicated).

The comment seems rather misleading, exec size doubling is unlikely to
have anything to do with this problem.

> +       */
> +      return MIN3(devinfo->gen >= 8 ? 16 : ((devinfo->gen == 7 &&
> +                                             !devinfo->is_haswell &&
> +                                             inst->exec_data_size() == 8) ? 4 : 8),
>                    2 * REG_SIZE / (inst->dst.stride * type_sz(inst->dst.type)),
>                    inst->exec_size);

I'm amazed that this works at all on HSW, according to the IVB and HSW
PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode. In addition, the
 sources must be assembled from GRF registers each accessed by adjacent
 index registers in 1x1 regioning modes."

So for DF instructions the execution size is not limited by the number
of address registers you have available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

I think this should be something along the lines of:

|   const unsigned max_size = (devinfo->gen >= 8 ? 2 : 1) * REG_SIZE;
|   return MIN3(devinfo->gen >= 8 ? 16 : 8,
|               max_size / (inst->dst.stride * type_sz(inst->dst.type)),
|               inst->exec_size);

>  
> -- 
> 2.9.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170112/f9c19c29/attachment-0001.sig>


More information about the mesa-dev mailing list