[Mesa-dev] [PATCH 1/9] i965/fs: Rewrite assign_constant_locations
Pohjolainen, Topi
topi.pohjolainen at gmail.com
Fri Dec 8 08:27:23 UTC 2017
On Thu, Dec 07, 2017 at 10:57:31AM -0800, Jason Ekstrand wrote:
> On Thu, Dec 7, 2017 at 9:57 AM, Pohjolainen, Topi <
> topi.pohjolainen at gmail.com> wrote:
>
> > On Thu, Dec 07, 2017 at 09:25:08AM -0800, Jason Ekstrand wrote:
> > > On Thu, Dec 7, 2017 at 9:10 AM, Pohjolainen, Topi <
> > > topi.pohjolainen at gmail.com> wrote:
> > >
> > > > On Wed, Dec 06, 2017 at 08:34:19PM -0800, Jason Ekstrand wrote:
> > > > > This rewires the logic for assigning uniform locations to work in
> > terms
> > > > > of "complex alignments". The basic idea is that, as we walk the
> > list of
> > > > > instructions, we keep track of the alignment and continuity
> > requirements
> > > > > of each slot and assert that the alignments all match up. We then
> > use
> > > > > those alignments in the compaction stage to ensure that everything
> > gets
> > > > > placed at a properly aligned register. The old mechanism handled
> > > > > alignments by special-casing each of the bit sizes and placing 64-bit
> > > > > values first followed by 32-bit values.
> > > > >
> > > > > The old scheme had the advantage of never leaving a hole since all
> > the
> > > > > 64-bit values could be tightly packed and so could the 32-bit values.
> > > > > However, the new scheme has no type size special cases so it handles
> > not
> > > > > only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
> > > > > types as the need arises.
> > > > >
> > > > > Tested-by: Jose Maria Casanova Crespo <jmcasanova at igalia.com>
> > > > > ---
> > > > > src/intel/compiler/brw_fs.cpp | 307 ++++++++++++++++++++++++------
> > > > ------------
> > > > > 1 file changed, 174 insertions(+), 133 deletions(-)
> > > > >
> > > > > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.
> > > > cpp
> > > > > index 93bb6b4..41260b4 100644
> > > > > --- a/src/intel/compiler/brw_fs.cpp
> > > > > +++ b/src/intel/compiler/brw_fs.cpp
> > > > > @@ -1906,62 +1906,6 @@ fs_visitor::compact_virtual_grfs()
> > > > > return progress;
> > > > > }
> > > > >
> > > > > -static void
> > > > > -set_push_pull_constant_loc(unsigned uniform, int *chunk_start,
> > > > > - unsigned *max_chunk_bitsize,
> > > > > - bool contiguous, unsigned bitsize,
> > > > > - const unsigned target_bitsize,
> > > > > - int *push_constant_loc, int
> > > > *pull_constant_loc,
> > > > > - unsigned *num_push_constants,
> > > > > - unsigned *num_pull_constants,
> > > > > - const unsigned max_push_components,
> > > > > - const unsigned max_chunk_size,
> > > > > - bool allow_pull_constants,
> > > > > - struct brw_stage_prog_data
> > *stage_prog_data)
> > > > > -{
> > > > > - /* This is the first live uniform in the chunk */
> > > > > - if (*chunk_start < 0)
> > > > > - *chunk_start = uniform;
> > > > > -
> > > > > - /* Keep track of the maximum bit size access in contiguous
> > uniforms
> > > > */
> > > > > - *max_chunk_bitsize = MAX2(*max_chunk_bitsize, bitsize);
> > > > > -
> > > > > - /* If this element does not need to be contiguous with the next,
> > we
> > > > > - * split at this point and everything between chunk_start and u
> > > > forms a
> > > > > - * single chunk.
> > > > > - */
> > > > > - if (!contiguous) {
> > > > > - /* If bitsize doesn't match the target one, skip it */
> > > > > - if (*max_chunk_bitsize != target_bitsize) {
> > > > > - /* FIXME: right now we only support 32 and 64-bit accesses
> > */
> > > > > - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8);
> > > > > - *max_chunk_bitsize = 0;
> > > > > - *chunk_start = -1;
> > > > > - return;
> > > > > - }
> > > > > -
> > > > > - unsigned chunk_size = uniform - *chunk_start + 1;
> > > > > -
> > > > > - /* Decide whether we should push or pull this parameter. In
> > the
> > > > > - * Vulkan driver, push constants are explicitly exposed via
> > the
> > > > API
> > > > > - * so we push everything. In GL, we only push small arrays.
> > > > > - */
> > > > > - if (!allow_pull_constants ||
> > > > > - (*num_push_constants + chunk_size <= max_push_components
> > &&
> > > > > - chunk_size <= max_chunk_size)) {
> > > > > - assert(*num_push_constants + chunk_size <=
> > > > max_push_components);
> > > > > - for (unsigned j = *chunk_start; j <= uniform; j++)
> > > > > - push_constant_loc[j] = (*num_push_constants)++;
> > > > > - } else {
> > > > > - for (unsigned j = *chunk_start; j <= uniform; j++)
> > > > > - pull_constant_loc[j] = (*num_pull_constants)++;
> > > > > - }
> > > > > -
> > > > > - *max_chunk_bitsize = 0;
> > > > > - *chunk_start = -1;
> > > > > - }
> > > > > -}
> > > > > -
> > > > > static int
> > > > > get_subgroup_id_param_index(const brw_stage_prog_data *prog_data)
> > > > > {
> > > > > @@ -1977,6 +1921,98 @@ get_subgroup_id_param_index(const
> > > > brw_stage_prog_data *prog_data)
> > > > > }
> > > > >
> > > > > /**
> > > > > + * Struct for handling complex alignments.
> > > > > + *
> > > > > + * A complex alignment is stored as multiplier and an offset. A
> > value
> > > > is
> > > > > + * considered to be aligned if it is congruent to the offset modulo
> > the
> > > > > + * multiplier.
> > > > > + */
> > > > > +struct cplx_align {
> > > > > + unsigned mul:4;
> > > > > + unsigned offset:4;
> > > > > +};
> > > > > +
> > > > > +#define CPLX_ALIGN_MAX_MUL 8
> > > > > +
> > > > > +static void
> > > > > +cplx_align_assert_sane(struct cplx_align a)
> > > > > +{
> > > > > + assert(a.mul > 0 && util_is_power_of_two(a.mul));
> > > > > + assert(a.offset < a.mul);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Combines two alignments to produce a least multiple of sorts.
> > > > > + *
> > > > > + * The returned alignment is the smallest (in terms of multiplier)
> > such
> > > > that
> > > > > + * anything aligned to both a and b will be aligned to the new
> > > > alignment.
> > > > > + * This function will assert-fail if a and b are not compatible,
> > i.e.
> > > > if the
> > > > > + * offset parameters are such that no common alignment is possible.
> > > > > + */
> > > > > +static struct cplx_align
> > > > > +cplx_align_combine(struct cplx_align a, struct cplx_align b)
> > > > > +{
> > > > > + cplx_align_assert_sane(a);
> > > > > + cplx_align_assert_sane(b);
> > > > > +
> > > > > + /* Assert that the alignments agree. */
> > > > > + assert((a.offset & (b.mul - 1)) == (b.offset & (a.mul - 1)));
> > > > > +
> > > > > + return a.mul > b.mul ? a : b;
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * Apply a complex alignment
> > > > > + *
> > > > > + * This function will return the smallest number greater than or
> > equal
> > > > to
> > > > > + * offset that is aligned to align.
> > > > > + */
> > > > > +static unsigned
> > > > > +cplx_align_apply(struct cplx_align align, unsigned offset)
> > > > > +{
> > > >
> > > > Should we assert that offset >= 4?
> > > >
> > >
> > > Not here, no. I intentionally want this code to be as generic as
> > > possible. We may have reason to re-use it (or something similar)
> > elsewhere
> > > and I don't want to build in lots of push-constant assumptions.
> > >
> > >
> > > > > + return ALIGN(offset - align.offset, align.mul) + align.offset;
> > > >
> > > > Soon I'm going to continue with glsl mediump support and thought
> > better to
> > > > read these patches. I stumbled on something (probably missing something
> > > > again):
> > > >
> > >
> > > :-)
> > >
> > >
> > > > To me it looks there can ever be only three different instances of
> > > > cplx_align:
> > > >
> > >
> > > Today, yes, but there may come a day when we want to re-arrange with a
> > > finer granularity than dwords.
> > >
> > >
> > > > {8, 0}, {8, 4} and {4, 0} and that "offset" argument here is always
> > > > multiple
> > > > of 4.
> > > >
> > > > In case of align == {8, 0} "offset" gets aligned to 8, and in case of
> > > > align == {4, 0} cplx_align_apply() is nop.
> > > >
> > > > But in case of {8, 4} and "offset" not multiple of 8, "offset" - 4 is
> > > > multiple of 8 and returned value becomes ALIGN(offset - 4, 8) + 4 which
> > > > in turn isn't aligned by 8.
> > > >
> > >
> > > Correct, and it shouldn't be. Let's walk through the calculation with a
> > > more complex alignment: {8, 6} (no, that isn't possible today but the
> > > calculations should work with it) and some concrete numbers: 6, 8, 10,
> > 12,
> > > 14, 16
> > >
> > > 6 -> ALIGN(6 - 6, 8) + 6 = 6
> > > 8 -> ALIGN(8 - 6, 8) + 6 = 14
> > > 10 -> ALIGN(10 - 6, 8) + 6 = 14
> > > 12 -> ALIGN(12 - 6, 8) + 6 = 14
> > > 14 -> ALIGN(14 - 6, 8) + 6 = 14
> > > 16 -> ALIGN(16 - 6, 8) + 6 = 22
> > >
> > > As you can see, the result of each is the next number that is 6 more
> > than a
> > > multiple of 8. There is also a question about negative numbers. Let's
> > > look at one more calculation, this time with more detail about the guts
> > of
> > > ALIGN():
> > >
> > > 4 -> ALIGN(4 - 6, 8) + 6 = (((4 - 6) + (8 - 1)) & ~(8-1)) + 6 = (5 & ~7)
> > +
> > > 6 = 6
> > >
> > > That one's a bit more complicated but I think you see what's going on.
> > If
> > > it makes things more clear, we could do something such as
> > >
> > > return (offset + (align.mul - align.offset - 1)) & ~(align.mul - 1)) +
> > > align.offset;
> >
> > Big thanks for the good explanation. I was also wondering why I didn't
> > fully
> > get the documentation of cplx_align. I kept asking myself "but offset is
> > always modulo of mul". Now the comment there makes sense :) Some of your
> > reply
> > here would probably go nicely there.
> >
>
> Any preference whether it goes in the documentation of struct cplx_align or
> cplx_align_apply? I'm leaning towards cplx_align
Sounds good.
>
>
> > I'll read the whole thing again to see if there was anything else I wanted
> > to
> > ask.
I have a few more questions. Replying again to the patch directly as things
are getting too indented here.
More information about the mesa-dev
mailing list