[Mesa-dev] [PATCH 3/5] intel/dev: Add devinfo cs_scratch_ids_per_subslice field
Jordan Justen
jordan.l.justen at intel.com
Wed Mar 7 20:36:02 UTC 2018
On 2018-03-07 00:47:12, Kenneth Graunke wrote:
> On Wednesday, March 7, 2018 12:16:28 AM PST Jordan Justen wrote:
> > Suggested-by: Kenneth Graunke <kenneth at whitecape.org>
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > ---
> > src/intel/dev/gen_device_info.c | 13 ++++++++-----
> > src/intel/dev/gen_device_info.h | 8 ++++++++
> > 2 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c
> > index 1773009d33c..de018c6a692 100644
> > --- a/src/intel/dev/gen_device_info.c
> > +++ b/src/intel/dev/gen_device_info.c
> > @@ -284,11 +284,13 @@ static const struct gen_device_info gen_device_info_byt = {
> > },
> > };
> >
> > -#define HSW_FEATURES \
> > - GEN7_FEATURES, \
> > - .is_haswell = true, \
> > - .supports_simd16_3src = true, \
> > - .has_resource_streamer = true
> > +#define HSW_FEATURES \
> > + GEN7_FEATURES, \
> > + .is_haswell = true, \
> > + .supports_simd16_3src = true, \
> > + .has_resource_streamer = true, \
> > + /* WaCSScratchSize:hsw */ \
> > + .cs_scratch_ids_per_subslice = 16 * 8
> >
> > static const struct gen_device_info gen_device_info_hsw_gt1 = {
> > HSW_FEATURES, .gt = 1,
> > @@ -476,6 +478,7 @@ static const struct gen_device_info gen_device_info_chv = {
> > .max_gs_threads = 80,
> > .max_wm_threads = 128,
> > .max_cs_threads = 6 * 7,
> > + .cs_scratch_ids_per_subslice = 8 * 7,
> > .urb = {
> > .size = 192,
> > .min_entries = {
> > diff --git a/src/intel/dev/gen_device_info.h b/src/intel/dev/gen_device_info.h
> > index b8044d00032..fa6b38f19ec 100644
> > --- a/src/intel/dev/gen_device_info.h
> > +++ b/src/intel/dev/gen_device_info.h
> > @@ -148,6 +148,14 @@ struct gen_device_info
> > */
> > unsigned max_cs_threads;
> >
> > + /**
> > + * The range of scratch addresses may differ from the number of EUs
> > + * available for compute programs. This requires allocating a larger
> > + * scratch buffer. For affected hardware, this will be set. If this is not
> > + * set, then max_cs_threads should be used instead.
> > + */
> > + unsigned cs_scratch_ids_per_subslice;
> > +
> > struct {
> > /**
> > * Hardware default URB size.
> >
>
> This works, but it's not quite what I had in mind. I was thinking you
> would add the new field, then add this hunk to gen_get_device_info():
>
> if (devinfo->is_haswell) {
> /* WaCSScratchSize:hsw
> *
> * Haswell's scratch space address calculation appears to be sparse
> * rather than tightly packed. The Thread ID has bits indicating
> * which subslice, EU within a subslice, and thread within an EU it
> * is. There's a maximum of two slices and two subslices, so these
> * can be stored with a single bit. Even though there are only 10 EUs
> * per subslice, this is stored in 4 bits, so there's an effective
> * maximum value of 16 EUs. Similarly, although there are only 7
> * threads per EU, this is stored in a 3 bit number, giving an
> * effective maximum value of 8 threads per EU.
> *
> * This means that we need to use 16 * 8 instead of 10 * 7 for the
> * number of threads per subslice.
> */
> devinfo->cs_scratch_ids_per_subslice = 16 * 8;
> } else if (devinfo->is_cherryview) {
> /* For Cherryview, it appears that the scratch addresses for the 6 EU
> * devices may still generate compute scratch addresses covering the
> * same range as 8 EU.
> */
> devinfo->cs_scratch_ids_per_subslice = 8 * 7;
> } else {
> devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads;
> }
Originally, I had initialized them in the structures, and then had:
if (devinfo->cs_scratch_ids_per_subslice == 0)
devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads;
I used it directly, like you mentioned below.
Then I got concerned about how the kernel params might update
max_cs_threads, and should that cause cs_scratch_ids_per_subslice to
be updated? So, I changed it to be treated as an override.
But, I could change the code that updates max_cs_threads to also
update cs_scratch_ids_per_subslice.
> This way, the field is always meaningful, and so in patches 4-5 you can
> just use devinfo->cs_scratch_ids_per_subslice directly rather than doing
>
> const unsigned scratch_ids_per_subslice =
> devinfo->cs_scratch_ids_per_subslice ?
> devinfo->cs_scratch_ids_per_subslice : devinfo->max_cs_threads;
>
> This also has the side effect of preserving the comment, which I think
> is valuable.
Well, I guess I thought that the comment might be a bit overwrought.
:) I mean, it's good to know that there's a special circumstance, but
maybe it's not too bad if you have to find the full details in the
commit history? :)
Nevertheless, I'll keep it.
-Jordan
> There's already some precedent for this, as
> gen_get_device_info() already whacks max_wm_threads precisely to deal
> with scratch IDs being somewhat sparse.
>
> --Ken
More information about the mesa-dev
mailing list