[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write
Jason Ekstrand
jason at jlekstrand.net
Sat Sep 20 14:58:30 PDT 2014
On Sep 20, 2014 2:52 PM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
>
> On Sat, Sep 20, 2014 at 4:56 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:
> >
> >
> > On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com>
wrote:
> >>
> >> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> >> wrote:
> >> > ---
> >> > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
> >> > ++++++++++++++++++++++-
> >> > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++
> >> > 2 files changed, 28 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > index b42f1e5..41e8855 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> >> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen,
int
> >> > reg_width)
> >> > /* Compute the total number of registers across all classes. */
> >> > int ra_reg_count = 0;
> >> > for (int i = 0; i < class_count; i++) {
> >> > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
> >> > ra_reg_count += base_reg_count - (class_sizes[i] - 1);
> >> > }
> >> > + screen->wm_reg_sets[index].class_first_reg[class_count] =
> >> > ra_reg_count;
> >> >
> >> > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t,
> >> > ra_reg_count);
> >> > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
> >> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
> >> > }
> >> >
> >> > setup_payload_interference(g, payload_node_count,
> >> > first_payload_node);
> >> > - if (brw->gen >= 7)
> >> > + if (brw->gen >= 7) {
> >> > setup_mrf_hack_interference(g, first_mrf_hack_node);
> >> >
> >> > + foreach_in_list(fs_inst, inst, &instructions) {
> >> > + /* When we do send-from-GRF for FB writes, we need to
ensure
> >> > that
> >> > + * the last write instruction sends from a high register.
> >> > This is
> >> > + * because the vertex fetcher wants to start filling the
low
> >> > + * payload registers while the pixel data port is still
> >> > working on
> >> > + * writing out the memory. If we don't do this, we get
> >> > rendering
> >> > + * artifacts.
> >> > + *
> >> > + * We could just do "something high". Instead, we just
pick
> >> > the
> >> > + * highest register that works.
> >> > + */
> >> > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
> >> > + int size = virtual_grf_sizes[inst->src[0].reg];
> >> > + int reg =
screen->wm_reg_sets[rsi].class_first_reg[size] -
> >> > 1;
> >> > + ra_set_node_reg(g, inst->src[0].reg, reg);
> >> > + break;
> >> > + }
> >> > + }
> >> > + }
> >> > +
> >> > if (dispatch_width > 8) {
> >> > /* In 16-wide dispatch we have an issue where a compressed
> >> > * instruction is actually two instructions executed
> >> > simultaneiously.
> >> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
> >> > b/src/mesa/drivers/dri/i965/intel_screen.h
> >> > index 945f6f5..8bc6abd 100644
> >> > --- a/src/mesa/drivers/dri/i965/intel_screen.h
> >> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> >> > @@ -90,6 +90,11 @@ struct intel_screen
> >> > int classes[16];
> >> >
> >> > /**
> >> > + * Array of the first ra reg in eahch ra class.
> >>
> >> Typo.
> >
> >
> > Thanks
> >
> >>
> >>
> >> > + */
> >> > + int class_first_reg[17];
> >> > +
> >> > + /**
> >> > * Mapping for register-allocated objects in *regs to the
first
> >> > * GRF for that object.
> >> > */
> >> > --
> >> > 2.1.0
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> I think it would be less confusing if instead of introducing a
> >> "class_first_reg" array you added a class_last_reg array and then did
> >> class_last_reg[size - 1] - iiuc, you're relying on the fact that the
> >> first reg of one class minus one is the last reg of the previous
> >> class, which isn't obvious and it takes a while to realize what's
> >> going on.
> >
> >
> > That's fair. I can change that easily enough. It doesn't really matter
> > that much. I just kind of liked the fact that last_reg[n] -
last_reg[n-1]
> > == class_size. But it doesn't matter and what you suggested would be
far
> > more obvious.
> >
> >>
> >> BTW, just so you know, I don't think it's generally a good idea to set
> >> more than one node to be pre-allocated to the same register, since
> >> things will blow up if those nodes interfere. It's probably OK here
> >> since I can't think of a way that an input register preallocated to
> >> the highest register would interfere with the input to the fb write,
> >
> >
> > Yes, I'm aware of the problem. However, the only things we preallocate
are
> > payload registers which only ever go at the beginning and the final FB
write
> > which goes at the end. As long as we don't start preallocating more
things,
> > I think we'll be fine. Also, isn't there some cap on how many payload
> > registers we can have? If so then that already solves our problem.
>
> Yeah, that's what I meant. I'm not sure if the varying payload
> registers can fill up the entire register space (and hence one gets
> allocated to the same register as the fb write load_payload
> destination), but I guess it doesn't matter too much.
>
> >
> >>
> >> but in the future we'll probably want to think about how to handle
> >> this in a more sophisticated way. This is sort of on a tangent, but
> >> with SSA register allocation we should probably allocate load_payload
> >> destinations before anything else and then treat load_payload as a
> >> series of writes to preallocated registers, since spilling the result
> >> of a load_payload seems like a silly thing to do and then we don't
> >> have to deal with allocating a contiguous set of registers which
> >
> >
> > If only that were true... The problem is that we will *always* have the
> > problem of contiguous registers. Once we add fp64 to the mix, we will
have
> > registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned
single
> > words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16. Sure, we can
> > probably preallocate payload registers, but we'll never get away from
> > handling multiple sizes.
>
> True, there's still that problem. And don't forget that we want to be
> able to allocate scalar/uniform registers, so now you have registers
> of size 1, 8, and 16...
Yeah. I think right now we always do those with pulls so its not a problem
yet. We may want to do that in the future though.
> I think Periera's puzzle-solving allocator
> (http://compilers.cs.ucla.edu/fernando/publications/papers/PhdDiss.pdf)
> handles the case where the register classes are aligned, so if we make
> SIMD16 registers aligned like we currently do it might work. It's
> definitely a difficult problem though, and it'll take a lot of work to
> solve.
Thanks for the link. I'll give it a look.
--Jason
>
> >
> >>
> >> breaks some nice properties of SSA register allocation. Then, we could
> >> have a simple heuristic to allocate the source of an fb write as high
> >> as possible, or just allocate all load_payload instructions as high as
> >> possible.
> >
> >
> > I've actually been thinking about the register allocation problem quite
a
> > bit lately. One of the things I've got on my todo list is to do some
> > research and figure out if we can be doing it better. There may be a
better
> > way to handle the multiple sizes problem than we're doing currently.
> > Unfortunately, I can't really say much now because I haven't dug into it
> > very far yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/ce04d1ae/attachment-0001.html>
More information about the mesa-dev
mailing list