[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write

Sat Sep 20 15:41:27 PDT 2014

On Saturday, September 20, 2014 01:56:11 PM Jason Ekstrand wrote:
> On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> 
> > On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> > wrote:
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
> > ++++++++++++++++++++++-
> > >  src/mesa/drivers/dri/i965/intel_screen.h          |  5 +++++
> > >  2 files changed, 28 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > index b42f1e5..41e8855 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int
> > reg_width)
> > >     /* Compute the total number of registers across all classes. */
> > >     int ra_reg_count = 0;
> > >     for (int i = 0; i < class_count; i++) {
> > > +      screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
> > >        ra_reg_count += base_reg_count - (class_sizes[i] - 1);
> > >     }
> > > +   screen->wm_reg_sets[index].class_first_reg[class_count] =
> > ra_reg_count;
> > >
> > >     uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
> > >     struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
> > > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
> > >     }
> > >
> > >     setup_payload_interference(g, payload_node_count,
> > first_payload_node);
> > > -   if (brw->gen >= 7)
> > > +   if (brw->gen >= 7) {
> > >        setup_mrf_hack_interference(g, first_mrf_hack_node);
> > >
> > > +      foreach_in_list(fs_inst, inst, &instructions) {
> > > +         /* When we do send-from-GRF for FB writes, we need to ensure
> > that
> > > +          * the last write instruction sends from a high register.
> > This is
> > > +          * because the vertex fetcher wants to start filling the low
> > > +          * payload registers while the pixel data port is still
> > working on
> > > +          * writing out the memory.  If we don't do this, we get
> > rendering
> > > +          * artifacts.
> > > +          *
> > > +          * We could just do "something high".  Instead, we just pick
> > the
> > > +          * highest register that works.
> > > +          */
> > > +         if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
> > > +            int size = virtual_grf_sizes[inst->src[0].reg];
> > > +            int reg = screen->wm_reg_sets[rsi].class_first_reg[size] -
> > 1;
> > > +            ra_set_node_reg(g, inst->src[0].reg, reg);
> > > +            break;
> > > +         }
> > > +      }
> > > +   }
> > > +
> > >     if (dispatch_width > 8) {
> > >        /* In 16-wide dispatch we have an issue where a compressed
> > >         * instruction is actually two instructions executed
> > simultaneiously.
> > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
> > b/src/mesa/drivers/dri/i965/intel_screen.h
> > > index 945f6f5..8bc6abd 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_screen.h
> > > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> > > @@ -90,6 +90,11 @@ struct intel_screen
> > >        int classes[16];
> > >
> > >        /**
> > > +       * Array of the first ra reg in eahch ra class.
> >
> > Typo.
> >
> 
> Thanks
> 
> 
> >
> > > +       */
> > > +      int class_first_reg[17];
> > > +
> > > +      /**
> > >         * Mapping for register-allocated objects in *regs to the first
> > >         * GRF for that object.
> > >         */
> > > --
> > > 2.1.0
> > >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> > I think it would be less confusing if instead of introducing a
> > "class_first_reg" array you added a class_last_reg array and then did
> > class_last_reg[size - 1] - iiuc, you're relying on the fact that the
> > first reg of one class minus one is the last reg of the previous
> > class, which isn't obvious and it takes a while to realize what's
> > going on.
> >
> 
> That's fair.  I can change that easily enough.  It doesn't really matter
> that much.  I just kind of liked the fact that  last_reg[n] - last_reg[n-1]
> == class_size.  But it doesn't matter and what you suggested would be far
> more obvious.
> 
> 
> > BTW, just so you know, I don't think it's generally a good idea to set
> > more than one node to be pre-allocated to the same register, since
> > things will blow up if those nodes interfere. It's probably OK here
> > since I can't think of a way that an input register preallocated to
> > the highest register would interfere with the input to the fb write,
> >
> 
> Yes, I'm aware of the problem.  However, the only things we preallocate are
> payload registers which only ever go at the beginning and the final FB
> write which goes at the end.  As long as we don't start preallocating more
> things, I think we'll be fine.  Also, isn't there some cap on how many
> payload registers we can have?  If so then that already solves our problem.

Here are the hardware limits (IVB PRM Volume 2 Part 1):
- At most 64 registers of push constants [3DSTATE_CONSTANT(Body)]
- At most 32 varyings [SF Output Attributes], at two registers each [pg 349]:
  64 registers of varyings.
- At most 64 registers of other payload data in SIMD32, or
  at most 33 registers of other payload data in SIMD16.

Of course, that's a completely pessimal case...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/9b4f5cdb/attachment.sig>