[Mesa-dev] [PATCH 05/36] i965: Import tables enumerating the set of validated L3 configurations.
Ben Widawsky
ben at bwidawsk.net
Tue Nov 17 17:27:51 PST 2015
On Sat, Nov 14, 2015 at 01:43:41PM -0800, Jordan Justen wrote:
> From: Francisco Jerez <currojerez at riseup.net>
>
> It should be possible to use additional L3 configurations other than
> the ones listed in the tables of validated allocations ("BSpec »
> 3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
> » L3 Allocation and Programming"), but it seems sensible for now to
> hard-code the tables in order to stick to the hardware docs. Instead
> of setting up the arbitrary L3 partitioning given as input, the
> closest validated L3 configuration will be looked up in these tables
> and used to program the hardware.
>
> The included tables should work for Gen7-9. Note that the quantities
> are specified in ways rather than in KB, this is because the L3
> control registers expect the value in ways, and because by doing that
> we can re-use a single table for all GT variants of the same
> generation (and in the case of IVB/HSW and CHV/SKL across different
> generations) which generally have different L3 way sizes but allow the
> same combinations of way allocations.
> ---
> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
> src/mesa/drivers/dri/i965/gen7_l3_state.c | 163 +++++++++++++++++++++++++++++
> 2 files changed, 164 insertions(+)
> create mode 100644 src/mesa/drivers/dri/i965/gen7_l3_state.c
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> index 5a88d66..91901ad 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -184,6 +184,7 @@ i965_FILES = \
> gen7_cs_state.c \
> gen7_disable.c \
> gen7_gs_state.c \
> + gen7_l3_state.c \
> gen7_misc_state.c \
> gen7_sf_state.c \
> gen7_sol_state.c \
> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> new file mode 100644
> index 0000000..8f9ba5b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "brw_state.h"
> +#include "intel_batchbuffer.h"
> +
> +/**
> + * Chunk of L3 cache reserved for some specific purpose.
> + */
> +enum brw_l3_partition {
> + /** Shared local memory. */
> + L3P_SLM = 0,
> + /** Unified return buffer. */
> + L3P_URB,
> + /** Union of DC and RO. */
> + L3P_ALL,
> + /** Data cluster RW partition. */
> + L3P_DC,
> + /** Union of IS, C and T. */
> + L3P_RO,
> + /** Instruction and state cache. */
> + L3P_IS,
> + /** Constant cache. */
> + L3P_C,
> + /** Texture cache. */
> + L3P_T,
> + /** Number of supported L3 partitions. */
> + NUM_L3P
> +};
> +
> +/**
> + * L3 configuration represented as the number of ways allocated for each
> + * partition. \sa get_l3_way_size().
> + */
> +struct brw_l3_config {
> + unsigned n[NUM_L3P];
> +};
> +
> +/**
> + * IVB/HSW validated L3 configurations.
> + */
> +static const struct brw_l3_config ivb_l3_configs[] = {
> + {{ 0, 32, 0, 0, 32, 0, 0, 0 }},
> + {{ 0, 32, 0, 16, 16, 0, 0, 0 }},
> + {{ 0, 32, 0, 4, 0, 8, 4, 16 }},
> + {{ 0, 28, 0, 8, 0, 8, 4, 16 }},
> + {{ 0, 28, 0, 16, 0, 8, 4, 8 }},
> + {{ 0, 28, 0, 8, 0, 16, 4, 8 }},
> + {{ 0, 28, 0, 0, 0, 16, 4, 16 }},
> + {{ 0, 32, 0, 0, 0, 16, 0, 16 }},
> + {{ 0, 28, 0, 4, 32, 0, 0, 0 }},
> + {{ 16, 16, 0, 16, 16, 0, 0, 0 }},
> + {{ 16, 16, 0, 8, 0, 8, 8, 8 }},
> + {{ 16, 16, 0, 4, 0, 8, 4, 16 }},
> + {{ 16, 16, 0, 4, 0, 16, 4, 8 }},
> + {{ 16, 16, 0, 0, 32, 0, 0, 0 }},
> + {{ 0 }}
> +};
> +
> +/**
> + * VLV validated L3 configurations.
> + */
> +static const struct brw_l3_config vlv_l3_configs[] = {
> + {{ 0, 80, 0, 0, 16, 0, 0, 0 }},
> + {{ 0, 80, 0, 8, 8, 0, 0, 0 }},
> + {{ 0, 64, 0, 16, 16, 0, 0, 0 }},
> + {{ 0, 64, 0, 0, 32, 0, 0, 0 }},
> + {{ 0, 60, 0, 4, 32, 0, 0, 0 }},
> + {{ 32, 32, 0, 16, 16, 0, 0, 0 }},
> + {{ 32, 40, 0, 8, 16, 0, 0, 0 }},
> + {{ 32, 40, 0, 16, 8, 0, 0, 0 }},
> + {{ 0 }}
> +};
> +
> +/**
> + * BDW validated L3 configurations.
> + */
> +static const struct brw_l3_config bdw_l3_configs[] = {
> + {{ 0, 48, 48, 0, 0, 0, 0, 0 }},
> + {{ 0, 48, 0, 16, 32, 0, 0, 0 }},
> + {{ 0, 32, 0, 16, 48, 0, 0, 0 }},
> + {{ 0, 32, 0, 0, 64, 0, 0, 0 }},
> + {{ 0, 32, 64, 0, 0, 0, 0, 0 }},
> + {{ 24, 16, 48, 0, 0, 0, 0, 0 }},
> + {{ 24, 16, 0, 16, 32, 0, 0, 0 }},
> + {{ 24, 16, 0, 32, 16, 0, 0, 0 }},
> + {{ 0 }}
> +};
> +
> +/**
> + * CHV/SKL validated L3 configurations.
> + */
> +static const struct brw_l3_config chv_l3_configs[] = {
> + {{ 0, 48, 48, 0, 0, 0, 0, 0 }},
> + {{ 0, 48, 0, 16, 32, 0, 0, 0 }},
> + {{ 0, 32, 0, 16, 48, 0, 0, 0 }},
> + {{ 0, 32, 0, 0, 64, 0, 0, 0 }},
> + {{ 0, 32, 64, 0, 0, 0, 0, 0 }},
> + {{ 32, 16, 48, 0, 0, 0, 0, 0 }},
> + {{ 32, 16, 0, 16, 32, 0, 0, 0 }},
> + {{ 32, 16, 0, 32, 16, 0, 0, 0 }},
> + {{ 0 }}
> +};
> +
> +/**
> + * Return a zero-terminated array of validated L3 configurations for the
> + * specified device.
> + */
> +static const struct brw_l3_config *
> +get_l3_configs(const struct brw_device_info *devinfo)
> +{
> + switch (devinfo->gen) {
> + case 7:
> + return (devinfo->is_baytrail ? vlv_l3_configs : ivb_l3_configs);
> +
> + case 8:
> + return (devinfo->is_cherryview ? chv_l3_configs : bdw_l3_configs);
> +
> + case 9:
> + return chv_l3_configs;
> +
> + default:
> + unreachable("Not implemented");
> + }
> +}
> +
> +/**
> + * Return the size of an L3 way in KB.
> + */
> +static unsigned
> +get_l3_way_size(const struct brw_device_info *devinfo)
> +{
> + if (devinfo->is_baytrail)
> + return 2;
Assuming there are 96 ways, your table above is correct, but I am having trouble
verifying.
> +
> + else if (devinfo->is_cherryview)
> + return 4;
I don't think this is right. Cherryview is 192k per bank, and 96 ways (see note
below about the confusion vs. 64). So I think it should be 2, right? That
actually makes your table for CHV correct. As it stands now it looks off by a
factor of 2 to me.
> +
> + else
> + return 2 << devinfo->gt;
> +}
Here is how I understand it using HSW as an example. Assuming the most basic and
clear facts are actually correct:
HSW has 64 ways per slice. That is actually clear.
HSW has 256K L3 per slice. That is also clear.
HSW has 2 L3 banks per slice. That is also clear.
Therefore on IVB/HSW I assume your function is meant to return the way size per
bank as opposed to way size per slice. If you had named that function
get_l3_way_size_per_bank(), it may have saved me 20 minutes of proving my wrong
assumption that you were doing this by slice. The reason I made that assumption
that it is per slice is because the only input in your calculation is gt, which
I first think of as slice count, not banks. Maybe that's just me, and I don't
want to impose my world view onto you. Similarly, the tables are per bank.
(comment or changing the function name definitely wouldn't hurt IMO). Anyway, I
feel pretty good about IVB/HSW.
So that's cool, and it makes sense, but then I get lost on BDW because there,
it's not clear how many ways the cache actually has. "Upto 64 ways tagged for
L3$, remaining is treated as memory." Since it seems like each bank is 192K, if
it's 64 ways, then you get 3K per way (48 sets), if it's 96 ways, you get 2K per
way (32 sets). I believe 96 ways is much more likely, I'm just trying to
determine how you came to the conclusion. (SKL and CHV seems to have 96 ways,
but then there is also a note "On Gen9, RW can use the entire 64 ways allocated
to L3, and RO can also use the entire 64 ways.")
Also FWIW, I think we have some problems with SKL and KBL GT4 since they can't
actually use the full amount of L3 IIRC. At least for the URB size this is true.
With the exception of CHV, and making some assumptions because the docs are
kinda sucky, it all looks right to me.
More information about the mesa-dev
mailing list