[Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

Kenneth Graunke kenneth at whitecape.org
Fri Nov 16 08:16:56 UTC 2018


On Thursday, November 15, 2018 11:16:09 PM PST Francisco Jerez wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
> 
> > On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
> >> Anuj Phogat <anuj.phogat at gmail.com> writes:
> >> 
> >> > Use L3 configuration table specified in h/w specification.
> >> >
> >> > Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
> >> > Cc: Kenneth Graunke <kenneth at whitecape.org>
> >> > Cc: Francisco Jerez <currojerez at riseup.net>
> >> > Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
> >> > ---
> >> >  src/intel/common/gen_l3_config.c | 16 ++++++++++------
> >> >  1 file changed, 10 insertions(+), 6 deletions(-)
> >> >
> >> > diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c
> >> > index b977c6ab136..079608198bc 100644
> >> > --- a/src/intel/common/gen_l3_config.c
> >> > +++ b/src/intel/common/gen_l3_config.c
> >> > @@ -137,12 +137,16 @@ static const struct gen_l3_config cnl_l3_configs[] = {
> >> >   */
> >> >  static const struct gen_l3_config icl_l3_configs[] = {
> >> >     /* SLM URB ALL DC  RO  IS   C   T */
> >> > -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
> >> > -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
> >> > -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
> >> > -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
> >> > -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
> >> > -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
> >> > +   {{  0, 32, 32,  0,  0,  0,  0,  0 }},
> >> 
> >> This configuration is inherently inefficient since it will always leave
> >> a third of the L3 cache unallocated.  According to the hardware docs
> >> it's only included for backwards compatibility.  I think we should
> >> remove it so we don't end up using it accidentally.
> >> 
> >> > +   {{  0, 32, 28,  0,  0,  0,  0,  0 }},
> >> > +   {{  0, 24,  0,  8, 28,  0,  0,  0 }},
> >> > +   {{  0, 16,  0,  0, 44,  0,  0,  0 }},
> >> > +   {{  0, 16, 12,  0,  0,  0,  0,  0 }},
> >> > +   {{  0, 16,  0,  0, 12,  0,  0,  0 }},
> >> 
> >> The configurations above won't work right now because we aren't setting
> >> up the command buffer and tile cache-related partitions in the L3
> >> control registers.  You either need to hook up the new partitions (and
> >> add array entries for them in gen_l3_config), or remove/comment out the
> >> five lines above.
> >> 
> >> > +   {{  0, 16, 80,  0,  0,  0,  0,  0 }},
> >> 
> >> From the results of the experiments we ran it seems like the last
> >> configuration above is busted due to some hardware bug.  It would make
> >> sense to remove or at least comment out the line so we don't use it
> >> accidentally until we get some better workaround from the hardware team.
> >> 
> >> > +   {{  0, 16, 48,  0,  0,  0,  0,  0 }},
> >> > +   {{  0, 16, 44,  0,  0,  0,  0,  0 }},
> >> 
> >> As before the above two configurations won't work due to the missing
> >> partitions introduced in ICL.  With these changes in place there's
> >> probably no need for PATCH 4 of this series.
> >> 
> >> > +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
> >> >     {{  0 }}
> >> >  };
> >> >  
> >> 
> >
> > So, one of the main motivations for dynamically reconfiguring the L3
> > on the fly was to add/remove the SLM partition as needed.  This isn't
> > required on Gen11+, as SLM is handled separately.  Furthermore, we just
> > use the "ALL" configs rather than manually partitioning things between
> > DC, RO, and so on.
> >
> > With that in mind, it seems like we basically always want to use the
> > last config (32/64) - or maybe the smaller URB one (16/80).  I am not
> > sure that we really need the ability to dynamically switch on the fly.
> >
> 
> There's no optimal choice between the two beforehand, that's why the
> hardware still provides the ability to reconfigure the L3 cache.
> Different workloads can benefit from different ratios of URB to the rest
> of the cache based on how bandwidth and geometry intensive they are.
> 
> > I had suggested to Anuj earlier to make brw_upload_initial_gpu_state()
> > program one of the two configs directly, then remove the gen7_l3_state
> > atom from the Gen11+ list.  We'd bypass all of this code entirely.  No
> > more lists of things we don't want, with manipulated weights to pick the
> > one thing we do want, and draw-time state flagging to re-select the same
> > config every time...
> >
> > It seems like it would be dramatically simpler.  This code is great for
> > Gen7+, I just don't think it makes sense for Gen11+.
> >
> > Curro, what do you think?
> >
> 
> It definitely still makes sense on Gen11+ AFAICT.  And we may still need
> to switch L3 partitions on the fly because of everybody's favorite ICL
> performance hardware feature.  I doubt that making the allocation static
> is a good plan.  What is your concern exactly?  Does it show up in your
> profiling logs at all?

Everybody's favorite performance hardware feature is not likely to ever
be implemented in i965.  Maybe in Iris.

We have this table of configs, and a set of weights.  But we don't use
DC/RO/IS/C/T weights on Gen8+ - only URB, ALL, and SLM.  On Gen11+, we
have exactly two weights.  These end up selecting between ~3 configs.
But the first one in the table underutilizes the cache, so we have to
drop or reorder it so the mechanism doesn't pick that one.

One of the two remaining configs currently doesn't appear to work,
for unknown reasons.  So patch 4 adjusts the URB weights to twice what
we use on other platforms - not because programs need/want more URB -
but for the specific purpose of dodging brokenness and causing the
selection mechanism to pick the single remaining desirable config.

That seems gross, IMO.  Weights should be based on the needs of bound
shaders.  Abusing the system in this manner makes me question whether
the mechanism is really what we want.  Having exactly one right answer
to the "which config should I use?" question makes me think we don't
want a mechanism at all...

Perhaps we'll get both configs working, and then will want to be able
to select between them.  I question whether the additional URB is truly
that valuable - how large are the actual gains? - considering that we
have to stall in order to reconfigure everything anyway...

*shrug*

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181116/98343626/attachment.sig>


More information about the mesa-dev mailing list