[Mesa-dev] [RFC PATCH] i965/gs: add snb support

Mon Feb 24 11:59:41 PST 2014

On 24 February 2014 00:36, Ilia Mirkin <imirkin at alum.mit.edu> wrote:

> Before you read any further, this is nowhere close to working. However
> it's in
> a state where I think most of the structure is there, albeit with a lot of
> XXX
> comments. And I haven't actually implemented the new opcodes I've added.
>
> I was hoping one or two Intel people could take a look at this and let me
> know
> of any pitfalls I'm likely to run into. I've already gotten a lot of help
> and
> advice from Ken, but wanted to put something out publicly.
>
> Any and all feedback much appreciated!
>
> Not-Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>

I'm really excited to see you're working on this!  A lot of people are
going to be really happy to see geometry shader support added to Gen6.

AFAICT, your approach is good.  I plan to look at it in more detail later
today or sometime tomorrow.  But before I do, here are some general
comments:

1. For long-term readability, it might be useful to have 3 classes:
vec4_gs_visitor (for stuff that's common to all gs visitors),
gen6_gs_visitor (for the stuff that's specific to gen6 geometry shaders),
and gen7plus_gs_visitor (for the stuff that's specific to gen7+ geometry
shaders).  What you currently have is a little harder to read, because it
means that to understand gen6 geometry shaders, you have to keep in mind
that some of the code in vec4_gs_visitor is never going to be executed
because it will always be overridden by gen6_gs_visitor.

2. Stream output (a.k.a. transform feedback): As you've probably already
figured out, transform feedback on gen6 is tricky because the hardware
doesn't have a dedicated pipeline stage for it; you have to emit code in
the geometry shader to do it.  This is a problem because the GL spec says
that two things are supposed to happen in between the GS stage and
transform feedback: dangling vertex removal (ignoring line strips
containing just a single vertex and ignoring triangle strips containing
less than 2 vertices), and expanding line/triangle strips into individual
line segments/triangles.  In Gen6, these operations happen after the GS
stage, so the code you emit in the geometry shader to do transform feedback
needs to simulate them.

Back when I first implemented transform feedback for Gen6, supporting
user-defined geometry shaders was the furthest thing from my mind, so I
implemented the transform feedback code in raw assembly rather than
re-using the vec4_visitor/vec4_generator infrastructure (see
gen6_sol_program()).  Unfortunately that's going to make it hard to re-use
that code in user-defined geometry shaders.  Also, when I implemented that
code, I was able to make some major simplifying assumptions because there
was no user-defined geometry shader--that meant that the number of vertices
to process was known at compile time, and I didn't have to worry about
dangling vertex removal or expanding line/triangle strips into individual
line segments/triangles (because the hardware automatically performs those
operations before invoking the GS).  You're not going to have that luxury.
My recommendation is: feel free to use gen6_sol_program() as a reference
point, but don't bother trying to re-use it--the simplifying assumptions
are just going to make it too different from what you want.  I'd recommend
leaving gen6_sol_program() around to handle the case where there's no
user-defined geometry shader, and then you can write fresh code for
handling transform feedback when there *is* a user-defined geometry shader.

Before you go too far on the implementation, I'd also recommend writing
some new piglit tests in tests/spec/glsl-1.50/execution/geometry/ to verify
that dangling vertex removal and strip expansion are handled properly for
transform feedback data.  If you don't have gs-supporting hardware at your
disposal to validate those tests against, I (or one of the Intel folks)
would be happy to validate the tests for you.  Note in particular that in
order to expand a triangle strip to triangles, you have to be careful with
the order to make sure that both the provoking vertex and the triangle
orientation are preserved (it's not necessary to follow
ARB_provoking_vertex, though; you can just assume the GL default of
LAST_VERTEX_CONVENTION applies).  For example, if the pre-expansion vertex
order is ABCDEF, then the only correct post-expansion order is
ABC,CBD,CDE,EDF.  I'm not sure whether Gen7+ hardware gets this order
exactly right in all circumstances, but it seems like it should be easy to
get it right on Gen6, because the order is entirely determined by software
:)

I'd also recommend adding some piglit tests to make sure that transform
feedback overflow conditions are handled properly when a geometry shader is
in use, since overflow handling on Gen6 is (partially) the responsibility
of the shader code you generate, and you're not going to be able to
piggy-back on the overflow handling in gen6_sol_program().

One other note: check out this text from the Gen6 programmer's reference
manual (page 172 of Vol 2 part 1): "The final contents of Stream Output
buffers must follow the strict pipeline ordering of vertices. Given this
ordering requirement, it will be necessary to run the GS stage in a
single-threaded fashion (Maximum Number of Threads == 1). Otherwise
concurrent GS threads might append vertices to the output buffer out of
order."  I believe this actually incorrect for gen6--I think it only
applies to Gen5 and earlier, and was left in the documentation for Gen6 by
accident.  Here's how I think it actually works in Gen6.  There are two
modes: either (a) the SVBI (the pointer that indicates where we are writing
to in the transform feedback buffer) is incremented automatically by the
hardware when spawning a GS thread, or (b) the SVBI is incremented manually
by the GS thread when issuing the FF_SYNC message.  The choice of mode (a)
or (b) is selected by the GEN6_GS_SVBI_POSTINCREMENT_ENABLE flag in
3DSTATE_GS.  The existing gen6_sol_program() logic uses mode (a).  But that
only works because the correct increment is known before the GS thread
starts executing.  You're going to have to use mode (b), since the correct
increment depends on how many vertices the user-provided GS program
outputs.  (Actually, it's even more complicated than that, because it has
to account for dangling vertex removal and expansion of triangle and line
strips).  So here's what I think you're going to have to do:

- Ignore the SVBI values that are provided to the GS thread when it is
spawned (in fact, don't even bother setting the GEN6_GS_SVBI_PAYLOAD_ENABLE
bit--that way you won't have to waste register space for them).  These
values are garbage when using mode (b) because lots of GS threads may be
spawned before the first one gets a chance to start writing transform
feedback output.
- Before issuing the FF_SYNC message, compute the number of transform
feedback vertices you're going to output, based on the pattern of calls to
EmitVertex() and EndPrimitive().  To compute this correctly you're going to
have to think about dangling vertex removal and line/triangle strip
expansion.
- In emit_thread_end(), when sending the FF_SYNC message, tell the hardware
how many vertices you're going to output--this determines how much it will
post-increment the SVBI pointer.
- In the response to the FF_SYNC message, the hardware tells you the value
the SVBI pointer had just before it post-incremented it.  This tells you
where to store the transform feedback output.
- Now, at the same time you're sending your vertices down the pipeline
using URB writes, you can also output them to the transform feedback buffer
using the SVBI pointer you got back from the FF_SYNC message.  This is
where you do the actual dangling vertex removal and line/triangle strip
expansion.

Note that you only need to do dangling vertex removal and line/triangle
strip transformation to the transform feedback data.  It's ok (and probably
easiest) for the data you deliver down the pipeline using URB writes to
have dangling vertices and unexpanded triangle/line strips.

Of course, I'm working from the same code and docs that you are, so I could
be mistaken about some of this :)

Once again, thanks for working on this!  I'll try to look over your code
sometime today or tomorrow and make more comments.

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140224/3126775f/attachment.html>