[Mesa-dev] [PATCH resend 3/7] i965: Enable hardware-generated binding tables on render path.

Tue Jun 9 01:08:24 PDT 2015

On Thursday, June 04, 2015 02:03:21 AM Kenneth Graunke wrote:
> On Thursday, June 04, 2015 11:38:52 AM Abdiel Janulgue wrote:
> > 
> > On 06/02/2015 10:54 AM, Kenneth Graunke wrote:
> > > On Monday, June 01, 2015 03:14:26 PM Abdiel Janulgue wrote:
> [snip]
> > I'm the one who is being funny here. After looking harder and then doing
> > some archaeological digs in my previous RS enabling efforts. I came to
> > the conclusion you are right. The reason the hardware *seems* to enforce
> > this arbitrary offset is that I skipped out the "disable RS on state
> > base address update" workaround. Now that I reintroduced it back,
> > hardware works completely fine even from offset zero. I'll update the
> > code in v2.
> 
> Great!  Scratch one mystery :)
> 
> Thanks, Abdiel.

(To bring the mailing list up to speed: Abdiel mentioned on IRC tonight
that this is actually still necessary---some Piglit tests worked with
the offset removed, but real applications didn't work.  I then noticed
that even "shader_runner glsl-fs-texture2d.shader_test" breaks when
hw_bt_start_offset = 0 - even with the workaround Abdiel mentions).

Abdiel,

I think I figured out why this is necessary.  In gen[78]_disable_stages,
we issue 3DSTATE_BINDING_TABLE_POINTERS_HS/DS packets with a "pointer"
value of 0.

In the software binding table case, this points to the start of the
batch buffer, which is harmless because the disabled HS/DS won't read
any surfaces.

However, the hardware binding table case is different: upon receiving
a 3DSTATE_BINDING_TABLE_POINTERS_XS packet, the hardware *writes* the
current on-die binding table to the given offset.  This is a maximum
of 256 16-bit surface state pointers.

My theory is that if we program legitimate binding tables at offset 0,
they get clobbered when gen7_disable_stages says that the HS/DS binding
tables should be written to offset 0.

By starting at an offset of 256 * sizeof(uint16_t), we are essentially
allocating a "dummy" binding table of maximum size.

Three things I tried fixed the problem:
1. Remove 3DSTATE_BINDING_TABLE_POINTERS_HS/DS from gen7_disable_stages.

   We never tell the HW to write out HS/DS tables, so the PS table at
   offset 0 doesn't get clobbered.

2. Change those packets to use offset 16000 (something large).

   We write out useless HS/DS tables, but to an unused spot in the
   buffer, so they don't trash anything.

3. Move the gen7_disable_stages atom immediately after the
   gen7_hw_binding_tables atom in the list.

   Instead of writing VS/PS tables then clobbering them with HS/DS,
   we reverse the order: write garbage HS/DS tables, then clobber them
   with the (actually useful) PS table.

This brought up a question: how does the hardware know how large of a
table to write?  Does it always write out all 256 entries?

It certainly seems to, as far as I can tell.  But that would mean that
when increasing brw->hw_bt_pool.next_offset, we always need to add
256 * sizeof(uint16_t), even if the table only has a few useful entries.
I'm a bit confused, because we're not doing that today...so shouldn't
something have broken?

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150609/18be036f/attachment.sig>