[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write

Sat Sep 20 14:58:30 PDT 2014

On Sep 20, 2014 2:52 PM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
>
> On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> >
> >
> > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com>
wrote:
> >>
> >> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> >> wrote:
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
> >> > ++++++++++++++++++++++-
> >> >  src/mesa/drivers/dri/i965/intel_screen.h          |  5 +++++
> >> >  2 files changed, 28 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > index b42f1e5..41e8855 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen,
int
> >> > reg_width)
> >> >     /* Compute the total number of registers across all classes. */
> >> >     int ra_reg_count = 0;
> >> >     for (int i = 0; i < class_count; i++) {
> >> > +      screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
> >> >        ra_reg_count += base_reg_count - (class_sizes[i] - 1);
> >> >     }
> >> > +   screen->wm_reg_sets[index].class_first_reg[class_count] =
> >> > ra_reg_count;
> >> >
> >> >     uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t,
> >> > ra_reg_count);
> >> >     struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
> >> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
> >> >     }
> >> >
> >> >     setup_payload_interference(g, payload_node_count,
> >> > first_payload_node);
> >> > -   if (brw->gen >= 7)
> >> > +   if (brw->gen >= 7) {
> >> >        setup_mrf_hack_interference(g, first_mrf_hack_node);
> >> >
> >> > +      foreach_in_list(fs_inst, inst, &instructions) {
> >> > +         /* When we do send-from-GRF for FB writes, we need to
ensure
> >> > that
> >> > +          * the last write instruction sends from a high register.
> >> > This is
> >> > +          * because the vertex fetcher wants to start filling the
low
> >> > +          * payload registers while the pixel data port is still
> >> > working on
> >> > +          * writing out the memory.  If we don't do this, we get
> >> > rendering
> >> > +          * artifacts.
> >> > +          *
> >> > +          * We could just do "something high".  Instead, we just
pick
> >> > the
> >> > +          * highest register that works.
> >> > +          */
> >> > +         if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
> >> > +            int size = virtual_grf_sizes[inst->src[0].reg];
> >> > +            int reg =
screen->wm_reg_sets[rsi].class_first_reg[size] -
> >> > 1;
> >> > +            ra_set_node_reg(g, inst->src[0].reg, reg);
> >> > +            break;
> >> > +         }
> >> > +      }
> >> > +   }
> >> > +
> >> >     if (dispatch_width > 8) {
> >> >        /* In 16-wide dispatch we have an issue where a compressed
> >> >         * instruction is actually two instructions executed
> >> > simultaneiously.
> >> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
> >> > b/src/mesa/drivers/dri/i965/intel_screen.h
> >> > index 945f6f5..8bc6abd 100644
> >> > --- a/src/mesa/drivers/dri/i965/intel_screen.h
> >> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> >> > @@ -90,6 +90,11 @@ struct intel_screen
> >> >        int classes[16];
> >> >
> >> >        /**
> >> > +       * Array of the first ra reg in eahch ra class.
> >>
> >> Typo.
> >
> >
> > Thanks
> >
> >>
> >>
> >> > +       */
> >> > +      int class_first_reg[17];
> >> > +
> >> > +      /**
> >> >         * Mapping for register-allocated objects in *regs to the
first
> >> >         * GRF for that object.
> >> >         */
> >> > --
> >> > 2.1.0
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> I think it would be less confusing if instead of introducing a
> >> "class_first_reg" array you added a class_last_reg array and then did
> >> class_last_reg[size - 1] - iiuc, you're relying on the fact that the
> >> first reg of one class minus one is the last reg of the previous
> >> class, which isn't obvious and it takes a while to realize what's
> >> going on.
> >
> >
> > That's fair.  I can change that easily enough.  It doesn't really matter
> > that much.  I just kind of liked the fact that  last_reg[n] -
last_reg[n-1]
> > == class_size.  But it doesn't matter and what you suggested would be
far
> > more obvious.
> >
> >>
> >> BTW, just so you know, I don't think it's generally a good idea to set
> >> more than one node to be pre-allocated to the same register, since
> >> things will blow up if those nodes interfere. It's probably OK here
> >> since I can't think of a way that an input register preallocated to
> >> the highest register would interfere with the input to the fb write,
> >
> >
> > Yes, I'm aware of the problem.  However, the only things we preallocate
are
> > payload registers which only ever go at the beginning and the final FB
write
> > which goes at the end.  As long as we don't start preallocating more
things,
> > I think we'll be fine.  Also, isn't there some cap on how many payload
> > registers we can have?  If so then that already solves our problem.
>
> Yeah, that's what I meant. I'm not sure if the varying payload
> registers can fill up the entire register space (and hence one gets
> allocated to the same register as the fb write load_payload
> destination), but I guess it doesn't matter too much.
>
> >
> >>
> >> but in the future we'll probably want to think about how to handle
> >> this in a more sophisticated way. This is sort of on a tangent, but
> >> with SSA register allocation we should probably allocate load_payload
> >> destinations before anything else and then treat load_payload as a
> >> series of writes to preallocated registers, since spilling the result
> >> of a load_payload seems like a silly thing to do and then we don't
> >> have to deal with allocating a contiguous set of registers which
> >
> >
> > If only that were true...  The problem is that we will *always* have the
> > problem of contiguous registers.  Once we add fp64 to the mix, we will
have
> > registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned
single
> > words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16.  Sure, we can
> > probably preallocate payload registers, but we'll never get away from
> > handling multiple sizes.
>
> True, there's still that problem. And don't forget that we want to be
> able to allocate scalar/uniform registers, so now you have registers
> of size 1, 8, and 16...

Yeah. I think right now we always do those with pulls so its not a problem
yet.  We may want to do that in the future though.

> I think Periera's puzzle-solving allocator
> (http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf)
> handles the case where the register classes are aligned, so if we make
> SIMD16 registers aligned like we currently do it might work. It's
> definitely a difficult problem though, and it'll take a lot of work to
> solve.

Thanks for the link. I'll give it a look.
--Jason
>
> >
> >>
> >> breaks some nice properties of SSA register allocation. Then, we could
> >> have a simple heuristic to allocate the source of an fb write as high
> >> as possible, or just allocate all load_payload instructions as high as
> >> possible.
> >
> >
> > I've actually been thinking about the register allocation problem quite
a
> > bit lately.  One of the things I've got on my todo list is to do some
> > research and figure out if we can be doing it better.  There may be a
better
> > way to handle the multiple sizes problem than we're doing currently.
> > Unfortunately, I can't really say much now because I haven't dug into it
> > very far yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/ce04d1ae/attachment-0001.html>