[Mesa-dev] [PATCH 3/3] i965/fs: Compute q-values for register allocation manually
Jason Ekstrand
jason at jlekstrand.net
Sat Oct 4 10:04:32 PDT 2014
On Oct 3, 2014 11:06 PM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
>
> On Fri, Oct 3, 2014 at 10:03 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> > Previously, we were allowing the register allocation code to do the
> > computation for us in ra_set_finalize. However, the runtime for this
> > computation is O(c^4 * g) where c is the number of classes and g is the
> > number of GRF registers. However, these q-values are directly
computable
> > based on the way we lay out our register classes so there is no need for
> > the aweful runtime algorithm.
> >
> > We were doing ok until commit 7210583eb where we bumped the number of
> > register classes from 11 to 16. While startup times don't normally
matter,
> > this caused piglit to take 4 times as long to run on Bay Trail. This
patch
> > should make generating the ra_set much faster and melt the piglit run
> > times.
>
> Did you check that you get the same q-values that we calculated
> previously? Bugs in this code are subtle and might not cause obvious
> problems.
I've double and tripple-checked my calculations but no, I haven't actually
done that test. Should be easy enough to hack up though so I'll try it
some time next week.
--Jason
>
> >
> > Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com>
> > ---
> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 52
++++++++++++++++++++++-
> > 1 file changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > index fd34941..15fa0f0 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > @@ -152,6 +152,13 @@ brw_alloc_reg_set(struct intel_screen *screen, int
reg_width)
> > int *classes = ralloc_array(screen, int, class_count);
> > int aligned_pairs_class = -1;
> >
> > + /* Allocate space for q values. We allocate class_count + 1
because we
> > + * want to leave room for the aligned pairs class if we have it. */
> > + unsigned int **q_values = ralloc_array(screen, unsigned int *,
> > + class_count + 1);
> > + for (int i = 0; i < class_count + 1; ++i)
> > + q_values[i] = ralloc_array(q_values, unsigned int, class_count +
1);
> > +
> > /* Now, add the registers to their classes, and add the conflicts
> > * between them and the base GRF registers (and also each other).
> > */
> > @@ -162,8 +169,41 @@ brw_alloc_reg_set(struct intel_screen *screen, int
reg_width)
> > int class_reg_count;
> > if (devinfo->gen <= 5 && reg_width == 2) {
> > class_reg_count = (base_reg_count - (class_sizes[i] - 1)) / 2;
> > +
> > + /* See comment below. The only difference here is that we are
> > + * dealing with pairs of registers instead of single
registers.
> > + * Registers of odd sizes simply get rounded up. */
> > + for (int j = 0; j < class_count; j++)
> > + q_values[i][j] = (class_sizes[i] + 1) / 2 +
> > + (class_sizes[j] + 1) / 2 - 1;
> > } else {
> > class_reg_count = base_reg_count - (class_sizes[i] - 1);
> > +
> > + /* From register_allocate.c:
> > + *
> > + * q(B,C) (indexed by C, B is this register class) in
> > + * Runeson/Nyström paper. This is "how many registers of B
could
> > + * the worst choice register from C conflict with".
> > + *
> > + * If we just let the register allocation algorithm compute
these
> > + * values, is extremely expensive. However, since all of our
> > + * registers are laid out, we can very easily compute them
> > + * ourselves. View the register from C as fixed starting at
GRF n
> > + * somwhere in the middle, and the register from B as sliding
back
> > + * and forth. Then the first register to conflict from B is
the
> > + * one starting at n - class_size[B] + 1 and the last
register to
> > + * conflict will start at n + class_size[B] - 1. Therefore,
the
> > + * number of conflicts from B is class_size[B] +
class_size[C] - 1.
> > + *
> > + * +-+-+-+-+-+-+ +-+-+-+-+-+-+
> > + * B | | | | | |n| --> | | | | | | |
> > + * +-+-+-+-+-+-+ +-+-+-+-+-+-+
> > + * +-+-+-+-+-+
> > + * C |n| | | | |
> > + * +-+-+-+-+-+
> > + */
> > + for (int j = 0; j < class_count; j++)
> > + q_values[i][j] = class_sizes[i] + class_sizes[j] - 1;
> > }
> > classes[i] = ra_alloc_reg_class(regs);
> >
> > @@ -208,7 +248,7 @@ brw_alloc_reg_set(struct intel_screen *screen, int
reg_width)
> > /* Add a special class for aligned pairs, which we'll put delta_x/y
> > * in on gen5 so that we can do PLN.
> > */
> > - if (devinfo->has_pln && devinfo->gen < 6) {
> > + if (devinfo->has_pln && reg_width == 1 && devinfo->gen < 6) {
> > aligned_pairs_class = ra_alloc_reg_class(regs);
> >
> > for (int i = 0; i < pairs_reg_count; i++) {
> > @@ -216,9 +256,17 @@ brw_alloc_reg_set(struct intel_screen *screen, int
reg_width)
> > ra_class_add_reg(regs, aligned_pairs_class, pairs_base_reg
+ i);
> > }
> > }
> > +
> > + for (int i = 0; i < class_count; i++) {
> > + q_values[class_count][i] = (class_sizes[i] + 1) / 2;
> > + q_values[i][class_count] = class_sizes[i] + 1;
> > + }
> > + q_values[class_count][class_count] = 1;
> > }
> >
> > - ra_set_finalize(regs, NULL);
> > + ra_set_finalize(regs, q_values);
> > +
> > + ralloc_free(q_values);
> >
> > screen->wm_reg_sets[index].regs = regs;
> > for (unsigned i = 0; i <
ARRAY_SIZE(screen->wm_reg_sets[index].classes); i++)
> > --
> > 2.1.0
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20141004/be4f06bb/attachment.html>
More information about the mesa-dev
mailing list