[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write

Jason Ekstrand jason at jlekstrand.net
Sat Sep 20 13:56:11 PDT 2014


On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:

> On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
> ++++++++++++++++++++++-
> >  src/mesa/drivers/dri/i965/intel_screen.h          |  5 +++++
> >  2 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > index b42f1e5..41e8855 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int
> reg_width)
> >     /* Compute the total number of registers across all classes. */
> >     int ra_reg_count = 0;
> >     for (int i = 0; i < class_count; i++) {
> > +      screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
> >        ra_reg_count += base_reg_count - (class_sizes[i] - 1);
> >     }
> > +   screen->wm_reg_sets[index].class_first_reg[class_count] =
> ra_reg_count;
> >
> >     uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
> >     struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
> > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
> >     }
> >
> >     setup_payload_interference(g, payload_node_count,
> first_payload_node);
> > -   if (brw->gen >= 7)
> > +   if (brw->gen >= 7) {
> >        setup_mrf_hack_interference(g, first_mrf_hack_node);
> >
> > +      foreach_in_list(fs_inst, inst, &instructions) {
> > +         /* When we do send-from-GRF for FB writes, we need to ensure
> that
> > +          * the last write instruction sends from a high register.
> This is
> > +          * because the vertex fetcher wants to start filling the low
> > +          * payload registers while the pixel data port is still
> working on
> > +          * writing out the memory.  If we don't do this, we get
> rendering
> > +          * artifacts.
> > +          *
> > +          * We could just do "something high".  Instead, we just pick
> the
> > +          * highest register that works.
> > +          */
> > +         if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
> > +            int size = virtual_grf_sizes[inst->src[0].reg];
> > +            int reg = screen->wm_reg_sets[rsi].class_first_reg[size] -
> 1;
> > +            ra_set_node_reg(g, inst->src[0].reg, reg);
> > +            break;
> > +         }
> > +      }
> > +   }
> > +
> >     if (dispatch_width > 8) {
> >        /* In 16-wide dispatch we have an issue where a compressed
> >         * instruction is actually two instructions executed
> simultaneiously.
> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
> b/src/mesa/drivers/dri/i965/intel_screen.h
> > index 945f6f5..8bc6abd 100644
> > --- a/src/mesa/drivers/dri/i965/intel_screen.h
> > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> > @@ -90,6 +90,11 @@ struct intel_screen
> >        int classes[16];
> >
> >        /**
> > +       * Array of the first ra reg in eahch ra class.
>
> Typo.
>

Thanks


>
> > +       */
> > +      int class_first_reg[17];
> > +
> > +      /**
> >         * Mapping for register-allocated objects in *regs to the first
> >         * GRF for that object.
> >         */
> > --
> > 2.1.0
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> I think it would be less confusing if instead of introducing a
> "class_first_reg" array you added a class_last_reg array and then did
> class_last_reg[size - 1] - iiuc, you're relying on the fact that the
> first reg of one class minus one is the last reg of the previous
> class, which isn't obvious and it takes a while to realize what's
> going on.
>

That's fair.  I can change that easily enough.  It doesn't really matter
that much.  I just kind of liked the fact that  last_reg[n] - last_reg[n-1]
== class_size.  But it doesn't matter and what you suggested would be far
more obvious.


> BTW, just so you know, I don't think it's generally a good idea to set
> more than one node to be pre-allocated to the same register, since
> things will blow up if those nodes interfere. It's probably OK here
> since I can't think of a way that an input register preallocated to
> the highest register would interfere with the input to the fb write,
>

Yes, I'm aware of the problem.  However, the only things we preallocate are
payload registers which only ever go at the beginning and the final FB
write which goes at the end.  As long as we don't start preallocating more
things, I think we'll be fine.  Also, isn't there some cap on how many
payload registers we can have?  If so then that already solves our problem.


> but in the future we'll probably want to think about how to handle
> this in a more sophisticated way. This is sort of on a tangent, but
> with SSA register allocation we should probably allocate load_payload
> destinations before anything else and then treat load_payload as a
> series of writes to preallocated registers, since spilling the result
> of a load_payload seems like a silly thing to do and then we don't
> have to deal with allocating a contiguous set of registers which
>

If only that were true...  The problem is that we will *always* have the
problem of contiguous registers.  Once we add fp64 to the mix, we will have
registers of size 1 and 2 in SIMD8 and registers of size 1 (Unsigned single
words), 2 (dwords, floats, etc) and 4 (doubles) in SIMD16.  Sure, we can
probably preallocate payload registers, but we'll never get away from
handling multiple sizes.


> breaks some nice properties of SSA register allocation. Then, we could
> have a simple heuristic to allocate the source of an fb write as high
> as possible, or just allocate all load_payload instructions as high as
> possible.
>

I've actually been thinking about the register allocation problem quite a
bit lately.  One of the things I've got on my todo list is to do some
research and figure out if we can be doing it better.  There may be a
better way to handle the multiple sizes problem than we're doing
currently.  Unfortunately, I can't really say much now because I haven't
dug into it very far yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/781c5304/attachment-0001.html>


More information about the mesa-dev mailing list