[Mesa-dev] [PATCH resend 3/7] i965: Enable hardware-generated binding tables on render path.

Tue Jun 9 10:29:43 PDT 2015

On Tuesday, June 09, 2015 03:01:00 PM Abdiel Janulgue wrote:
> 
> Thanks a lot Ken! Your help in debugging this pointed me in the right
> direction! I had some further observations below:
> 
> On 06/09/2015 11:08 AM, Kenneth Graunke wrote:
> 
> > 
> > (To bring the mailing list up to speed: Abdiel mentioned on IRC tonight
> > that this is actually still necessary---some Piglit tests worked with
> > the offset removed, but real applications didn't work.  I then noticed
> > that even "shader_runner glsl-fs-texture2d.shader_test" breaks when
> > hw_bt_start_offset = 0 - even with the workaround Abdiel mentions).
> > 
> > Abdiel,
> > 
> > I think I figured out why this is necessary.  In gen[78]_disable_stages,
> > we issue 3DSTATE_BINDING_TABLE_POINTERS_HS/DS packets with a "pointer"
> > value of 0.
> > 
> > In the software binding table case, this points to the start of the
> > batch buffer, which is harmless because the disabled HS/DS won't read
> > any surfaces.
> > 
> > However, the hardware binding table case is different: upon receiving
> > a 3DSTATE_BINDING_TABLE_POINTERS_XS packet, the hardware *writes* the
> > current on-die binding table to the given offset.  This is a maximum
> > of 256 16-bit surface state pointers.
> > 
> > My theory is that if we program legitimate binding tables at offset 0,
> > they get clobbered when gen7_disable_stages says that the HS/DS binding
> > tables should be written to offset 0.
> > 
> > By starting at an offset of 256 * sizeof(uint16_t), we are essentially
> > allocating a "dummy" binding table of maximum size.
> > 
> > Three things I tried fixed the problem:
> > 1. Remove 3DSTATE_BINDING_TABLE_POINTERS_HS/DS from gen7_disable_stages.
> > 
> >    We never tell the HW to write out HS/DS tables, so the PS table at
> >    offset 0 doesn't get clobbered.
> 
> I vaguely remember somehow sometime ago that removing the "nullify
> binding table pointer" resulted in some random hangs. But I can't
> remember which exact test or game I used. So don't take my word for it.
> 
> > 
> > 2. Change those packets to use offset 16000 (something large).
> > 
> >    We write out useless HS/DS tables, but to an unused spot in the
> >    buffer, so they don't trash anything.
> > 
> > 3. Move the gen7_disable_stages atom immediately after the
> >    gen7_hw_binding_tables atom in the list.
> > 
> >    Instead of writing VS/PS tables then clobbering them with HS/DS,
> >    we reverse the order: write garbage HS/DS tables, then clobber them
> >    with the (actually useful) PS table.
> > 
> 
> > This brought up a question: how does the hardware know how large of a
> > table to write?  Does it always write out all 256 entries?
> > 
> > It certainly seems to, as far as I can tell.  But that would mean that
> > when increasing brw->hw_bt_pool.next_offset, we always need to add
> > 256 * sizeof(uint16_t), even if the table only has a few useful entries.
> > I'm a bit confused, because we're not doing that today...so shouldn't
> > something have broken?
> 
> I did further digging and found out that the answer is both yes and no.
> Yes because the hardware is actually able to write to the buffer even
> without specifying the full 256 entry size. I confirmed this by mapping
> out the contents of the valid binding table pool after every flush and
> the entries are correct as long as the offsets used are aligned to
> 64-bytes. It also doesn't overwrite the entries written starting from
> the next 3DSTATE_BINDING_TABLE_POINTERS_XS offset.
> 
> And no, because for some reason, garbage hardware binding table pointers
> (written by gen7_disable in this case) seem to require a full 256-entry
> allocation when stating from offset zero. If another
> 3DSTATE_BINDING_TABLE_POINTERS_XS offset touches an offset within that
> "garbage" range then the hardware gets angry. This is the reason why
> anything less than 256 * sizeof(uint16_t) didn't work earlier.
> 
> To summarize, this is what currently happened when we used offset zero
> with the current code:
> 
> btp:    valid  -> garbage  ->  valid
> offset: 0      -> 0        ->  64
> 
> Notice that the hangs here are caused because the valid binding table
> pointer is overwritten with the garbage binding table pointer. This is
> same conclusion you mentioned earlier. Also, the other "valid" binding
> table offset is less than 256 * sizeof(uint16_t).
> 
> Because of the same reason, using a start offset for the valid binding
> table pointer which is less than 256 * sizeof(uint16_t) hangs too:
> 
> btp:    valid   -> garbage  ->  valid
> offset: 128     -> 0        ->  128 + 64
> 
> 
> However doing this instead works:
> 
> btp:    valid -> garbage -> valid
> offset: 0     -> 64      -> 64
> 
> It's kinda like step 3 you mentioned above except we don't reverse the
> order of the atoms but specify brw->hw_bt_pool.next_offset to the
> 3DSTATE_BINDING_TABLE_POINTERS_HS/DS offsets in gen7_disable instead of
> zero. We don't have to allocate space for the garbage hardware binding
> table because it gets overwritten with the valid pointer anyway. This
> approach worked perfectly as well.
> 
> -Abdiel

It sounds like perhaps the hardware is always writing out 256-byte
entries.  However, once they're written, we're free to *overwrite*
the unused portions when writing the next table.

Your suggestion sounds like a reasonable solution.  gen7_disable_stages
is going away soon anyway - Chris is going to wire up tessellation for
real.

Don't forget to fix the possible buffer overflow - I think every time we
emit a binding table, we should make sure there's 256 * sizeof(uint16_t)
bytes left in the buffer.  If not, flush (and reset the offset).

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150609/025ddf2e/attachment.sig>