[Mesa-dev] [PATCH v2 11/11] intel/nir: Enable nir_opt_find_array_copies

Thu Aug 23 22:42:48 UTC 2018

On Sat, Jul 28, 2018 at 10:44:42PM -0700, Jason Ekstrand wrote:
> We have to be a bit careful with this one because we want it to run in
> the optimization loop but only in the first brw_nir_optimize call.
> Later calls assume that we've lowered away copy_deref instructions and
> we don't want to introduce any more.
> 
> Shader-db results on Kaby Lake:
> 
>     total instructions in shared programs: 15176942 -> 15176942 (0.00%)
>     instructions in affected programs: 0 -> 0
>     helped: 0
>     HURT: 0
> 
> In spite of the lack of any shader-db improvement, this patch completely
> eliminates spilling in the Batman: Arkham City tessellation shaders.
> This is because we are now able to detect that the temporary array
> created by DXVK for storing TCS inputs is a copy of the input arrays and
> use indirect URB reads instead of making a copy of 4.5 KiB of input data
> and then indirecting on it with if-ladders.
> ---
>  src/intel/compiler/brw_nir.c | 16 +++++++++-------
>  src/intel/compiler/brw_nir.h |  3 ++-
>  2 files changed, 11 insertions(+), 8 deletions(-)

Given the comment mentioned below is added. This patch is

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira at intel.com>

There's also an optional suggestion.

> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 5e9da9e1ef2..2417c0cd618 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -533,7 +533,7 @@ brw_nir_no_indirect_mask(const struct brw_compiler *compiler,
>  
>  nir_shader *
>  brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
> -                 bool is_scalar)
> +                 bool is_scalar, bool allow_copies)
>  {
>     nir_variable_mode indirect_mask =
>        brw_nir_no_indirect_mask(compiler, nir->info.stage);
> @@ -544,6 +544,8 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
>        OPT(nir_split_array_vars, nir_var_local);
>        OPT(nir_shrink_vec_array_vars, nir_var_local);
>        OPT(nir_lower_vars_to_ssa);
> +      if (allow_copies)
> +         OPT(nir_opt_find_array_copies);
>        OPT(nir_opt_copy_prop_vars);
>  
>        if (is_scalar) {

Adding a comment here with a subset of your first paragraph in the commit
message will make reduce the chance of future changes to optimize mess
things up.

> @@ -664,7 +666,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
>                          nir_lower_isign64 |
>                          nir_lower_divmod64);
>  
> -   nir = brw_nir_optimize(nir, compiler, is_scalar);
> +   nir = brw_nir_optimize(nir, compiler, is_scalar, true);
>  
>     /* This needs to be run after the first optimization pass but before we
>      * lower indirect derefs away
> @@ -701,7 +703,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
>     nir_lower_indirect_derefs(nir, indirect_mask);
>  
>     /* Get rid of split copies */
> -   nir = brw_nir_optimize(nir, compiler, is_scalar);
> +   nir = brw_nir_optimize(nir, compiler, is_scalar, false);

Optional: the call-sites usually have a "is_scalar" variable around,
making that boolean parameter "self-documenting".  This is not the
case for allow_copies.  One idea is to make this last argument a flag,
and add an enum for the ALLOW_CREATE_DEREF_COPIES or something like.

Thanks,
Caio