[Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

Francisco Jerez currojerez at riseup.net
Fri Nov 16 07:16:09 UTC 2018


Kenneth Graunke <kenneth at whitecape.org> writes:

> On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
>> Anuj Phogat <anuj.phogat at gmail.com> writes:
>> 
>> > Use L3 configuration table specified in h/w specification.
>> >
>> > Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
>> > Cc: Kenneth Graunke <kenneth at whitecape.org>
>> > Cc: Francisco Jerez <currojerez at riseup.net>
>> > Cc: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> > ---
>> >  src/intel/common/gen_l3_config.c | 16 ++++++++++------
>> >  1 file changed, 10 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c
>> > index b977c6ab136..079608198bc 100644
>> > --- a/src/intel/common/gen_l3_config.c
>> > +++ b/src/intel/common/gen_l3_config.c
>> > @@ -137,12 +137,16 @@ static const struct gen_l3_config cnl_l3_configs[] = {
>> >   */
>> >  static const struct gen_l3_config icl_l3_configs[] = {
>> >     /* SLM URB ALL DC  RO  IS   C   T */
>> > -   {{  0, 64, 64,  0,  0,  0,  0,  0 }},
>> > -   {{  0, 64,  0, 16, 48,  0,  0,  0 }},
>> > -   {{  0, 48,  0, 16, 64,  0,  0,  0 }},
>> > -   {{  0, 32,  0,  0, 96,  0,  0,  0 }},
>> > -   {{  0, 32, 96,  0,  0,  0,  0,  0 }},
>> > -   {{  0, 32,  0, 16, 80,  0,  0,  0 }},
>> > +   {{  0, 32, 32,  0,  0,  0,  0,  0 }},
>> 
>> This configuration is inherently inefficient since it will always leave
>> a third of the L3 cache unallocated.  According to the hardware docs
>> it's only included for backwards compatibility.  I think we should
>> remove it so we don't end up using it accidentally.
>> 
>> > +   {{  0, 32, 28,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 24,  0,  8, 28,  0,  0,  0 }},
>> > +   {{  0, 16,  0,  0, 44,  0,  0,  0 }},
>> > +   {{  0, 16, 12,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 16,  0,  0, 12,  0,  0,  0 }},
>> 
>> The configurations above won't work right now because we aren't setting
>> up the command buffer and tile cache-related partitions in the L3
>> control registers.  You either need to hook up the new partitions (and
>> add array entries for them in gen_l3_config), or remove/comment out the
>> five lines above.
>> 
>> > +   {{  0, 16, 80,  0,  0,  0,  0,  0 }},
>> 
>> From the results of the experiments we ran it seems like the last
>> configuration above is busted due to some hardware bug.  It would make
>> sense to remove or at least comment out the line so we don't use it
>> accidentally until we get some better workaround from the hardware team.
>> 
>> > +   {{  0, 16, 48,  0,  0,  0,  0,  0 }},
>> > +   {{  0, 16, 44,  0,  0,  0,  0,  0 }},
>> 
>> As before the above two configurations won't work due to the missing
>> partitions introduced in ICL.  With these changes in place there's
>> probably no need for PATCH 4 of this series.
>> 
>> > +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
>> >     {{  0 }}
>> >  };
>> >  
>> 
>
> So, one of the main motivations for dynamically reconfiguring the L3
> on the fly was to add/remove the SLM partition as needed.  This isn't
> required on Gen11+, as SLM is handled separately.  Furthermore, we just
> use the "ALL" configs rather than manually partitioning things between
> DC, RO, and so on.
>
> With that in mind, it seems like we basically always want to use the
> last config (32/64) - or maybe the smaller URB one (16/80).  I am not
> sure that we really need the ability to dynamically switch on the fly.
>

There's no optimal choice between the two beforehand, that's why the
hardware still provides the ability to reconfigure the L3 cache.
Different workloads can benefit from different ratios of URB to the rest
of the cache based on how bandwidth and geometry intensive they are.

> I had suggested to Anuj earlier to make brw_upload_initial_gpu_state()
> program one of the two configs directly, then remove the gen7_l3_state
> atom from the Gen11+ list.  We'd bypass all of this code entirely.  No
> more lists of things we don't want, with manipulated weights to pick the
> one thing we do want, and draw-time state flagging to re-select the same
> config every time...
>
> It seems like it would be dramatically simpler.  This code is great for
> Gen7+, I just don't think it makes sense for Gen11+.
>
> Curro, what do you think?
>

It definitely still makes sense on Gen11+ AFAICT.  And we may still need
to switch L3 partitions on the fly because of everybody's favorite ICL
performance hardware feature.  I doubt that making the allocation static
is a good plan.  What is your concern exactly?  Does it show up in your
profiling logs at all?

> --Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181115/295aaee5/attachment-0001.sig>


More information about the mesa-dev mailing list