[Mesa-dev] [PATCH 05/36] i965: Import tables enumerating the set of validated L3 configurations.

Francisco Jerez currojerez at riseup.net
Wed Nov 18 04:38:44 PST 2015


Ben Widawsky <ben at bwidawsk.net> writes:

> On Sat, Nov 14, 2015 at 01:43:41PM -0800, Jordan Justen wrote:
>> From: Francisco Jerez <currojerez at riseup.net>
>> 
>> It should be possible to use additional L3 configurations other than
>> the ones listed in the tables of validated allocations ("BSpec »
>> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
>> » L3 Allocation and Programming"), but it seems sensible for now to
>> hard-code the tables in order to stick to the hardware docs.  Instead
>> of setting up the arbitrary L3 partitioning given as input, the
>> closest validated L3 configuration will be looked up in these tables
>> and used to program the hardware.
>> 
>> The included tables should work for Gen7-9.  Note that the quantities
>> are specified in ways rather than in KB, this is because the L3
>> control registers expect the value in ways, and because by doing that
>> we can re-use a single table for all GT variants of the same
>> generation (and in the case of IVB/HSW and CHV/SKL across different
>> generations) which generally have different L3 way sizes but allow the
>> same combinations of way allocations.
>> ---
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/gen7_l3_state.c  | 163 +++++++++++++++++++++++++++++
>>  2 files changed, 164 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>> 
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 5a88d66..91901ad 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -184,6 +184,7 @@ i965_FILES = \
>>  	gen7_cs_state.c \
>>  	gen7_disable.c \
>>  	gen7_gs_state.c \
>> +	gen7_l3_state.c \
>>  	gen7_misc_state.c \
>>  	gen7_sf_state.c \
>>  	gen7_sol_state.c \
>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> new file mode 100644
>> index 0000000..8f9ba5b
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> @@ -0,0 +1,163 @@
>> +/*
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "brw_context.h"
>> +#include "brw_defines.h"
>> +#include "brw_state.h"
>> +#include "intel_batchbuffer.h"
>> +
>> +/**
>> + * Chunk of L3 cache reserved for some specific purpose.
>> + */
>> +enum brw_l3_partition {
>> +   /** Shared local memory. */
>> +   L3P_SLM = 0,
>> +   /** Unified return buffer. */
>> +   L3P_URB,
>> +   /** Union of DC and RO. */
>> +   L3P_ALL,
>> +   /** Data cluster RW partition. */
>> +   L3P_DC,
>> +   /** Union of IS, C and T. */
>> +   L3P_RO,
>> +   /** Instruction and state cache. */
>> +   L3P_IS,
>> +   /** Constant cache. */
>> +   L3P_C,
>> +   /** Texture cache. */
>> +   L3P_T,
>> +   /** Number of supported L3 partitions. */
>> +   NUM_L3P
>> +};
>> +
>> +/**
>> + * L3 configuration represented as the number of ways allocated for each
>> + * partition.  \sa get_l3_way_size().
>> + */
>> +struct brw_l3_config {
>> +   unsigned n[NUM_L3P];
>> +};
>> +
>> +/**
>> + * IVB/HSW validated L3 configurations.
>> + */
>> +static const struct brw_l3_config ivb_l3_configs[] = {
>> +   {{  0, 32,  0,  0, 32,  0,  0,  0 }},
>> +   {{  0, 32,  0, 16, 16,  0,  0,  0 }},
>> +   {{  0, 32,  0,  4,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0,  8,  0,  8,  4, 16 }},
>> +   {{  0, 28,  0, 16,  0,  8,  4,  8 }},
>> +   {{  0, 28,  0,  8,  0, 16,  4,  8 }},
>> +   {{  0, 28,  0,  0,  0, 16,  4, 16 }},
>> +   {{  0, 32,  0,  0,  0, 16,  0, 16 }},
>> +   {{  0, 28,  0,  4, 32,  0,  0,  0 }},
>> +   {{ 16, 16,  0, 16, 16,  0,  0,  0 }},
>> +   {{ 16, 16,  0,  8,  0,  8,  8,  8 }},
>> +   {{ 16, 16,  0,  4,  0,  8,  4, 16 }},
>> +   {{ 16, 16,  0,  4,  0, 16,  4,  8 }},
>> +   {{ 16, 16,  0,  0, 32,  0,  0,  0 }},
>> +   {{ 0 }}
>> +};
>> +
>> +/**
>> + * VLV validated L3 configurations.
>> + */
>> +static const struct brw_l3_config vlv_l3_configs[] = {
>> +   {{  0, 80,  0,  0, 16,  0,  0,  0 }},
>> +   {{  0, 80,  0,  8,  8,  0,  0,  0 }},
>> +   {{  0, 64,  0, 16, 16,  0,  0,  0 }},
>> +   {{  0, 64,  0,  0, 32,  0,  0,  0 }},
>> +   {{  0, 60,  0,  4, 32,  0,  0,  0 }},
>> +   {{ 32, 32,  0, 16, 16,  0,  0,  0 }},
>> +   {{ 32, 40,  0,  8, 16,  0,  0,  0 }},
>> +   {{ 32, 40,  0, 16,  8,  0,  0,  0 }},
>> +   {{ 0 }}
>> +};
>> +
>> +/**
>> + * BDW validated L3 configurations.
>> + */
>> +static const struct brw_l3_config bdw_l3_configs[] = {
>> +   {{  0, 48, 48,  0,  0,  0,  0,  0 }},
>> +   {{  0, 48,  0, 16, 32,  0,  0,  0 }},
>> +   {{  0, 32,  0, 16, 48,  0,  0,  0 }},
>> +   {{  0, 32,  0,  0, 64,  0,  0,  0 }},
>> +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
>> +   {{ 24, 16, 48,  0,  0,  0,  0,  0 }},
>> +   {{ 24, 16,  0, 16, 32,  0,  0,  0 }},
>> +   {{ 24, 16,  0, 32, 16,  0,  0,  0 }},
>> +   {{ 0 }}
>> +};
>> +
>> +/**
>> + * CHV/SKL validated L3 configurations.
>> + */
>> +static const struct brw_l3_config chv_l3_configs[] = {
>> +   {{  0, 48, 48,  0,  0,  0,  0,  0 }},
>> +   {{  0, 48,  0, 16, 32,  0,  0,  0 }},
>> +   {{  0, 32,  0, 16, 48,  0,  0,  0 }},
>> +   {{  0, 32,  0,  0, 64,  0,  0,  0 }},
>> +   {{  0, 32, 64,  0,  0,  0,  0,  0 }},
>> +   {{ 32, 16, 48,  0,  0,  0,  0,  0 }},
>> +   {{ 32, 16,  0, 16, 32,  0,  0,  0 }},
>> +   {{ 32, 16,  0, 32, 16,  0,  0,  0 }},
>> +   {{ 0 }}
>> +};
>> +
>> +/**
>> + * Return a zero-terminated array of validated L3 configurations for the
>> + * specified device.
>> + */
>> +static const struct brw_l3_config *
>> +get_l3_configs(const struct brw_device_info *devinfo)
>> +{
>> +   switch (devinfo->gen) {
>> +   case 7:
>> +      return (devinfo->is_baytrail ? vlv_l3_configs : ivb_l3_configs);
>> +
>> +   case 8:
>> +      return (devinfo->is_cherryview ? chv_l3_configs : bdw_l3_configs);
>> +
>> +   case 9:
>> +      return chv_l3_configs;
>> +
>> +   default:
>> +      unreachable("Not implemented");
>> +   }
>> +}
>> +
>> +/**
>> + * Return the size of an L3 way in KB.
>> + */
>> +static unsigned
>> +get_l3_way_size(const struct brw_device_info *devinfo)
>> +{
>> +   if (devinfo->is_baytrail)
>> +      return 2;
>
> Assuming there are 96 ways, your table above is correct, but I am having trouble
> verifying.
>
>> +
>> +   else if (devinfo->is_cherryview)
>> +      return 4;
>
> I don't think this is right. Cherryview is 192k per bank, and 96 ways (see note
> below about the confusion vs. 64). So I think it should be 2, right? 
>
CHV has 384KB of L3 and 96 ways, what gives 4 KB of L3 per way (see
"BXML » GT » GTI » vol1i L3-URB DevBDW » [Register] L3 Control Register
[BDW+]").  Note that for a given generation the number of L3 ways
available is independent from the number of L3 banks present on the
chip, adding banks only scales the amount of memory kept track of as a
single way, that's why it makes sense to store the L3 configuration
tables in units of ways instead of KB.

> That actually makes your table for CHV correct. As it stands now it
> looks off by a factor of 2 to me.

Don't know what you mean by it being off.  If you multiply the values on
the CHV table by the way size of 4 KB you get the same table you can
find in the hardware docs ("BSpec » 3D-Media-GPGPU Engine » L3 Cache and
URB [IVB+] » L3 Cache and URB [CHV] » L3 Allocation and Programming
[CHV]").

>> +
>> +   else
>> +      return 2 << devinfo->gt;
>> +}
>
>
> Here is how I understand it using HSW as an example. Assuming the most basic and
> clear facts are actually correct:
> HSW has 64 ways per slice. That is actually clear.

Nope, HSW and IVB have 64 ways regardless of the number of subslices or
L3 banks present on the chip.

> HSW has 256K L3 per slice. That is also clear.
> HSW has 2 L3 banks per slice. That is also clear.
>

HSW has two banks (256KB) of L3 per subslice (1 subslice on GT1, 2 on
GT2, 4 on GT3).

> Therefore on IVB/HSW I assume your function is meant to return the way size per
> bank as opposed to way size per slice. If you had named that function
> get_l3_way_size_per_bank(), it may have saved me 20 minutes of proving my wrong
> assumption that you were doing this by slice.

The function calculates the overall amount of memory available per way,
it's neither per slice nor per bank, it's the actual amount of memory in
KB -- See how it's used later on to implement update_urb_size() in
"[PATCH 10/36] i965: Implement L3 state atom.".

> The reason I made that assumption that it is per slice is because the
> only input in your calculation is gt, which I first think of as slice
> count, not banks.

If it were per subslice it wouldn't depend on the GT number at all ;).

> Maybe that's just me, and I don't want to impose my
> world view onto you. Similarly, the tables are per bank.  (comment or
> changing the function name definitely wouldn't hurt IMO). Anyway, I
> feel pretty good about IVB/HSW.
>

The tables are global, they give the total number of ways allocated for
each partition (which is independent of the number of banks, so it
wouldn't make sense to store them per bank).

> So that's cool, and it makes sense, but then I get lost on BDW because there,
> it's not clear how many ways the cache actually has. "Upto 64 ways tagged for
> L3$, remaining is treated as memory." Since it seems like each bank is 192K, if
> it's 64 ways, then you get 3K per way (48 sets), if it's 96 ways, you get 2K per
> way (32 sets). I believe 96 ways is much more likely, I'm just trying to
> determine how you came to the conclusion. 

It's 96.  The way size is documented in "BXML » GT » GTI » vol1i L3-URB
DevBDW » [Register] L3 Control Register [BDW+]" (8 KB on BDW GT2).  To
get the overall number of ways divide the total L3 size (768 KB on GT2)
by the way size what gives you 96.

> (SKL and CHV seems to have 96 ways, but then there is also a note "On
> Gen9, RW can use the entire 64 ways allocated to L3, and RO can also
> use the entire 64 ways.")
>
Yes, SKL has 96 ways too.  The restriction is that neither the RW nor RO
partitions can use more than 64 ways (AFAIK because otherwise the number
of ways couldn't be a power of two which is required for the hashing to
work correctly).  These restrictions are already taken into account in
the tables I took from the "L3 Allocation and Programming" sections.

> Also FWIW, I think we have some problems with SKL and KBL GT4 since they can't
> actually use the full amount of L3 IIRC. At least for the URB size this is true.
>

Hmm, I think the problem is that GT4 parts only have one slice more than
GT3 parts, so you get 1.5x as many L3 space rather than 2x.  There's
also the limitation on the total URB size imposed by FF units which
wasn't taken into account by the i965 driver when I implemented this
series but it is now AFAICT, so this series would cause a regression on
SKL GT4 -- I'll fix that and resend.

>
> With the exception of CHV, and making some assumptions because the docs are
> kinda sucky, it all looks right to me.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151118/f2fbefe5/attachment-0001.sig>


More information about the mesa-dev mailing list