[Mesa-dev] [PATCH] nir: add nir_instr_type_tex support to nir_lower_phis_to_scalar()

Fri Feb 22 22:05:12 UTC 2019

On 23/2/19 8:54 am, Jason Ekstrand wrote:
> On Fri, Feb 22, 2019 at 2:47 PM Timothy Arceri <tarceri at itsqueeze.com 
> <mailto:tarceri at itsqueeze.com>> wrote:
> 
> 
> 
>     On 23/2/19 6:31 am, Rob Clark wrote:
>      > On Fri, Feb 22, 2019 at 12:39 PM Jason Ekstrand
>     <jason at jlekstrand.net <mailto:jason at jlekstrand.net>> wrote:
>      >>
>      >> On Fri, Feb 22, 2019 at 9:51 AM Eric Anholt <eric at anholt.net
>     <mailto:eric at anholt.net>> wrote:
>      >>>
>      >>> Timothy Arceri <tarceri at itsqueeze.com
>     <mailto:tarceri at itsqueeze.com>> writes:
>      >>>
>      >>>> shader-db results i965 (SKL):
>      >>>>
>      >>>> total instructions in shared programs: 13219105 -> 13024761
>     (-1.47%)
>      >>>> instructions in affected programs: 1169457 -> 975113 (-16.62%)
>      >>>> helped: 599
>      >>>> HURT: 154
>      >>>>
>      >>>> total cycles in shared programs: 333968972 -> 324822073 (-2.74%)
>      >>>> cycles in affected programs: 130032440 -> 120885541 (-7.03%)
>      >>>> helped: 590
>      >>>> HURT: 216
>      >>>>
>      >>>> total spills in shared programs: 57947 -> 29130 (-49.73%)
>      >>>> spills in affected programs: 53364 -> 24547 (-54.00%)
>      >>>> helped: 351
>      >>>> HURT: 0
>      >>>>
>      >>>> total fills in shared programs: 51310 -> 25468 (-50.36%)
>      >>>> fills in affected programs: 44882 -> 19040 (-57.58%)
>      >>>> helped: 351
>      >>>> HURT: 0
>      >>>> ---
>      >>>>   src/compiler/nir/nir_lower_phis_to_scalar.c | 1 +
>      >>>>   1 file changed, 1 insertion(+)
>      >>>>
>      >>>> diff --git a/src/compiler/nir/nir_lower_phis_to_scalar.c
>     b/src/compiler/nir/nir_lower_phis_to_scalar.c
>      >>>> index 16001f73685..f6f702bca15 100644
>      >>>> --- a/src/compiler/nir/nir_lower_phis_to_scalar.c
>      >>>> +++ b/src/compiler/nir/nir_lower_phis_to_scalar.c
>      >>>> @@ -74,6 +74,7 @@ is_phi_src_scalarizable(nir_phi_src *src,
>      >>>>         /* A phi is scalarizable if we're going to lower it */
>      >>>>         return should_lower_phi(nir_instr_as_phi(src_instr),
>     state);
>      >>>>
>      >>>> +   case nir_instr_type_tex:
>      >>>>      case nir_instr_type_load_const:
>      >>>>      case nir_instr_type_ssa_undef:
>      >>>>         /* These are trivially scalarizable */
>      >>>
>      >>> Sounds promising, but I would definitely not describe
>     instr_type_tex as
>      >>> "trivially scalarizable" -- could you explain what's going on
>     with this
>      >>> patch?
> 
>     Basically it just turns:
> 
>     if ssa0 {
>         ...
>         vec4 ssa1 = txf .....
>     } else {
>          ...
>          vec4 ssa2 = ...
>     }
>     vec4 ss3 = phi ssa1, ssa2
> 
>     Into
> 
>     if ssa0 {
>         ...
>         vec4 ssa1 = txf .....
>         vec1 ssa2 = imov ssa1.x
>         vec1 ssa3 = imov ssa1.y
>         vec1 ssa4 = imov ssa1.z
>         vec1 ssa5 = imov ssa1.w
>     } else {
>          ...
>          vec4 ssa6 = ...
>          vec1 ssa7 = imov ssa6.x
>          vec1 ssa8 = imov ssa6.y
>          vec1 ssa9 = imov ssa6.z
>          vec1 ssa10 = imov ssa6.w
>     }
>     vec1 ss11 = phi ssa2, ssa7
>     vec1 ss12 = phi ssa3, ssa8
>     vec1 ss13 = phi ssa4, ssa9
>     vec1 ss14 = phi ssa5, ssa10
> 
> 
>     This allows a whole bunch more optimisation to take place as often not
>     all of the phi channels are actually used. If some cases large
>     chunks of
>     logic can be remove from the if branch that doesn't contain the texture
>     access.
> 
>      >>
>      >>
>      >> I think I can for Intel though I'm not sure how this affects
>     other drivers.
>      >>
>      >> On Intel hardware, we almost always have to combine all the
>     texture sources into one big message.  Since having more than one
>     source is very common, this means that we have to make a temporary
>     copy of the sources anyway.  Because we're copying them, having them
>     contiguous (a vector in NIR terms) doesn't actually gain us
>     anything.  We may as well let NIR scalarize them and give more
>     freedom to the register allocator and other NIR passes which may
>     need to clean things up.  We don't want to make the same choice for
>     destinations as they are required to be contiguous.
>      >>
>      >
>      > hmm, but this is abut the phi src (ie. the tex dest), not the tex
>     src, isn't it?
> 
> 
> Well, that's a pickle...  When I wrote this pass I did so to try and 
> explicitly keep from breaking up things that are known to be vectors in 
> the back-end such as textures.  The idea was to try and not break up 
> things like this:
> 
>   if (...) {
>      ssa_1 = tex()
> } else {
>      ssa_2 = tex()
> }
> ssa_3 = phi (ssa_1, ssa_2)
> 
> in the hopes that it would turn into
> 
> if (...) {
>      r1 = tex();
> } else {
>      r1 = tex();
> }
> 
> Clearly, that notion was mis-placed.  At this point, I really wonder 
> what the complexity is saving us.  Maybe it's not worth it at all?  
> Maybe we need to be more agressive and require all sources to not be 
> vectorizable or something?

Maybe checking if all components of the phi are used before converting 
to scalar would be useful? Not sure but I've sent a more conservative v2 
of this patch where a bunch of hurt is gone.

The remaining hurt is very small and comes from shaders like this:

  if (...) {
       r1 = tex();
  } else {
     if (...) {
       r1 = tex();
     } else {
       r1 = undefined;
     }
  }