[Mesa-dev] [PATCH 7/8] i965: remove GLSL IR optimisation loop for gen7.5+

Fri Apr 21 07:33:13 UTC 2017

On Monday, April 17, 2017 10:52:24 PM PDT Timothy Arceri wrote:
> From: Timothy Arceri <timothy.arceri at collabora.com>
> 
> V2: leave copy propagation to avoid interpolation regressions
> 
> IVB is running into some spilling issues with the loop
> removed so we leave it there for gen7 and below for now.
> 
> Run time for shader-db on my machine goes from ~795 seconds to
> ~665 seconds.
> 
> shader-db results BDW:
> 
> total instructions in shared programs: 12969459 -> 12968891 (-0.00%)
> instructions in affected programs: 1463154 -> 1462586 (-0.04%)
> helped: 3622
> HURT: 3326
> 
> total cycles in shared programs: 246453572 -> 246504318 (0.02%)
> cycles in affected programs: 208842622 -> 208893368 (0.02%)
> helped: 24029
> HURT: 35407
> 
> total loops in shared programs: 2931 -> 2931 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
> 
> total spills in shared programs: 14560 -> 14498 (-0.43%)
> spills in affected programs: 2270 -> 2208 (-2.73%)
> helped: 17
> HURT: 2
> 
> total fills in shared programs: 19671 -> 19632 (-0.20%)
> fills in affected programs: 2060 -> 2021 (-1.89%)
> helped: 17
> HURT: 2
> 
> LOST:   17
> GAINED: 40
> 
> Most of the hurt shaders are 1-2 instructions, with what looks like a max of 7.
>
> I've looked at the worst cycles regressions and as far as I can tell its just
> a scheduling difference.

Wow!  That's terrific - a max of 7 instructions I can definitely live with.

Hopefully Curro's scheduling work will help with the cycle counts.

> ---
>  src/mesa/drivers/dri/i965/brw_link.cpp | 27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp b/src/mesa/drivers/dri/i965/brw_link.cpp
> index 7c10a40..bb7ab4f 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -81,21 +81,20 @@ brw_lower_packing_builtins(struct brw_context *brw,
>  
>     lower_packing_builtins(ir, LOWER_PACK_HALF_2x16 | LOWER_UNPACK_HALF_2x16);
>  }
>  
>  static void
>  process_glsl_ir(struct brw_context *brw,
>                  struct gl_shader_program *shader_prog,
>                  struct gl_linked_shader *shader)
>  {
>     struct gl_context *ctx = &brw->ctx;
> -   const struct brw_compiler *compiler = brw->screen->compiler;
>     const struct gl_shader_compiler_options *options =
>        &ctx->Const.ShaderCompilerOptions[shader->Stage];
>  
>     /* Temporary memory context for any new IR. */
>     void *mem_ctx = ralloc_context(NULL);
>  
>     ralloc_adopt(mem_ctx, shader->ir);
>  
>     lower_blend_equation_advanced(shader);
>  
> @@ -125,34 +124,32 @@ process_glsl_ir(struct brw_context *brw,
>     if (brw->gen < 6)
>        lower_if_to_cond_assign(shader->Stage, shader->ir, 16);
>  
>     do_lower_texture_projection(shader->ir);
>     do_vec_index_to_cond_assign(shader->ir);
>     lower_vector_insert(shader->ir, true);
>     lower_offset_arrays(shader->ir);
>     lower_noise(shader->ir);
>     lower_quadop_vector(shader->ir, false);
>  
> -   bool progress;
> -   do {
> -      progress = false;
> -
> -      if (compiler->scalar_stage[shader->Stage]) {
> -         if (shader->Stage == MESA_SHADER_VERTEX ||
> -             shader->Stage == MESA_SHADER_FRAGMENT)
> -            brw_do_channel_expressions(shader->ir);
> -         brw_do_vector_splitting(shader->ir);
> -      }
> -
> -      progress = do_common_optimization(shader->ir, true, true,
> -                                        options, ctx->Const.NativeIntegers) || progress;
> -   } while (progress);
> +   /* TODO: IVB is failing to link with the GLSL IR opts removed for the
> +    * piglit test:
> +    *    piglit.spec.arb_gpu_shader_fp64.varying-packing.simple
> +    *
> +    * With the error "VS compile failed: no register to spill". If we can fix
> +    * this we should be able to remove this optimisation loop.
> +    */

If that's the only reason, we could probably drop them on Gen4-6 too.
Honestly, I'd probably just drop them entirely - fp64 is brand new for
Ivybridge, this is probably an unlikely real world case, and ultimately
it's a problem with the vec4 backend's spilling support.

We do need to look into the couple of test regressions on Haswell
(which you think are existing enhanced layouts bugs).

Series is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

> +   if (brw->gen <= 7 && !brw->is_haswell) {
> +      while (do_common_optimization(shader->ir, true, true, options,
> +                             ctx->Const.NativeIntegers))
> +         ;
> +   }
>  
>     validate_ir_tree(shader->ir);
>  
>     /* Now that we've finished altering the linked IR, reparent any live IR back
>      * to the permanent memory context, and free the temporary one (discarding any
>      * junk we optimized away).
>      */
>     reparent_ir(shader->ir, shader->ir);
>     ralloc_free(mem_ctx);
>  
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170421/198f88c5/attachment.sig>