[Mesa-dev] [PATCH 41/41] SQUASH: i965/fs: Force a high register for the final FB write
Kenneth Graunke
kenneth at whitecape.org
Sat Sep 20 15:41:27 PDT 2014
On Saturday, September 20, 2014 01:56:11 PM Jason Ekstrand wrote:
> On Sat, Sep 20, 2014 at 12:44 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>
> > On Sat, Sep 20, 2014 at 1:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> > wrote:
> > > ---
> > > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 24
> > ++++++++++++++++++++++-
> > > src/mesa/drivers/dri/i965/intel_screen.h | 5 +++++
> > > 2 files changed, 28 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > index b42f1e5..41e8855 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
> > > @@ -116,8 +116,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int
> > reg_width)
> > > /* Compute the total number of registers across all classes. */
> > > int ra_reg_count = 0;
> > > for (int i = 0; i < class_count; i++) {
> > > + screen->wm_reg_sets[index].class_first_reg[i] = ra_reg_count;
> > > ra_reg_count += base_reg_count - (class_sizes[i] - 1);
> > > }
> > > + screen->wm_reg_sets[index].class_first_reg[class_count] =
> > ra_reg_count;
> > >
> > > uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
> > > struct ra_regs *regs = ra_alloc_reg_set(screen, ra_reg_count);
> > > @@ -469,9 +471,29 @@ fs_visitor::assign_regs(bool allow_spilling)
> > > }
> > >
> > > setup_payload_interference(g, payload_node_count,
> > first_payload_node);
> > > - if (brw->gen >= 7)
> > > + if (brw->gen >= 7) {
> > > setup_mrf_hack_interference(g, first_mrf_hack_node);
> > >
> > > + foreach_in_list(fs_inst, inst, &instructions) {
> > > + /* When we do send-from-GRF for FB writes, we need to ensure
> > that
> > > + * the last write instruction sends from a high register.
> > This is
> > > + * because the vertex fetcher wants to start filling the low
> > > + * payload registers while the pixel data port is still
> > working on
> > > + * writing out the memory. If we don't do this, we get
> > rendering
> > > + * artifacts.
> > > + *
> > > + * We could just do "something high". Instead, we just pick
> > the
> > > + * highest register that works.
> > > + */
> > > + if (inst->opcode == FS_OPCODE_FB_WRITE && inst->eot) {
> > > + int size = virtual_grf_sizes[inst->src[0].reg];
> > > + int reg = screen->wm_reg_sets[rsi].class_first_reg[size] -
> > 1;
> > > + ra_set_node_reg(g, inst->src[0].reg, reg);
> > > + break;
> > > + }
> > > + }
> > > + }
> > > +
> > > if (dispatch_width > 8) {
> > > /* In 16-wide dispatch we have an issue where a compressed
> > > * instruction is actually two instructions executed
> > simultaneiously.
> > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.h
> > b/src/mesa/drivers/dri/i965/intel_screen.h
> > > index 945f6f5..8bc6abd 100644
> > > --- a/src/mesa/drivers/dri/i965/intel_screen.h
> > > +++ b/src/mesa/drivers/dri/i965/intel_screen.h
> > > @@ -90,6 +90,11 @@ struct intel_screen
> > > int classes[16];
> > >
> > > /**
> > > + * Array of the first ra reg in eahch ra class.
> >
> > Typo.
> >
>
> Thanks
>
>
> >
> > > + */
> > > + int class_first_reg[17];
> > > +
> > > + /**
> > > * Mapping for register-allocated objects in *regs to the first
> > > * GRF for that object.
> > > */
> > > --
> > > 2.1.0
> > >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> > I think it would be less confusing if instead of introducing a
> > "class_first_reg" array you added a class_last_reg array and then did
> > class_last_reg[size - 1] - iiuc, you're relying on the fact that the
> > first reg of one class minus one is the last reg of the previous
> > class, which isn't obvious and it takes a while to realize what's
> > going on.
> >
>
> That's fair. I can change that easily enough. It doesn't really matter
> that much. I just kind of liked the fact that last_reg[n] - last_reg[n-1]
> == class_size. But it doesn't matter and what you suggested would be far
> more obvious.
>
>
> > BTW, just so you know, I don't think it's generally a good idea to set
> > more than one node to be pre-allocated to the same register, since
> > things will blow up if those nodes interfere. It's probably OK here
> > since I can't think of a way that an input register preallocated to
> > the highest register would interfere with the input to the fb write,
> >
>
> Yes, I'm aware of the problem. However, the only things we preallocate are
> payload registers which only ever go at the beginning and the final FB
> write which goes at the end. As long as we don't start preallocating more
> things, I think we'll be fine. Also, isn't there some cap on how many
> payload registers we can have? If so then that already solves our problem.
Here are the hardware limits (IVB PRM Volume 2 Part 1):
- At most 64 registers of push constants [3DSTATE_CONSTANT(Body)]
- At most 32 varyings [SF Output Attributes], at two registers each [pg 349]:
64 registers of varyings.
- At most 64 registers of other payload data in SIMD32, or
at most 33 registers of other payload data in SIMD16.
Of course, that's a completely pessimal case...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140920/9b4f5cdb/attachment.sig>
More information about the mesa-dev
mailing list