[Mesa-dev] [RFC PATCH] i965/gs: add snb support

Tue Feb 25 09:47:48 PST 2014

On 2014-02-24 20:59, Paul Berry wrote:
> On 24 February 2014 00:36, Ilia Mirkin <imirkin at alum.mit.edu
> <mailto:imirkin at alum.mit.edu>> wrote:
>
>     Before you read any further, this is nowhere close to working.
>     However it's in
>     a state where I think most of the structure is there, albeit with a
>     lot of XXX
>     comments. And I haven't actually implemented the new opcodes I've added.
>
>     I was hoping one or two Intel people could take a look at this and
>     let me know
>     of any pitfalls I'm likely to run into. I've already gotten a lot of
>     help and
>     advice from Ken, but wanted to put something out publicly.
>
>     Any and all feedback much appreciated!
>
>     Not-Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu
>     <mailto:imirkin at alum.mit.edu>>
>
>
> I'm really excited to see you're working on this!  A lot of people are
> going to be really happy to see geometry shader support added to Gen6.
>
> AFAICT, your approach is good.  I plan to look at it in more detail
> later today or sometime tomorrow.  But before I do, here are some
> general comments:
>
> 1. For long-term readability, it might be useful to have 3 classes:
> vec4_gs_visitor (for stuff that's common to all gs visitors),
> gen6_gs_visitor (for the stuff that's specific to gen6 geometry
> shaders), and gen7plus_gs_visitor (for the stuff that's specific to
> gen7+ geometry shaders).  What you currently have is a little harder to
> read, because it means that to understand gen6 geometry shaders, you
> have to keep in mind that some of the code in vec4_gs_visitor is never
> going to be executed because it will always be overridden by
> gen6_gs_visitor.
>
> 2. Stream output (a.k.a. transform feedback): As you've probably already
> figured out, transform feedback on gen6 is tricky because the hardware
> doesn't have a dedicated pipeline stage for it; you have to emit code in
> the geometry shader to do it.  This is a problem because the GL spec
> says that two things are supposed to happen in between the GS stage and
> transform feedback: dangling vertex removal (ignoring line strips
> containing just a single vertex and ignoring triangle strips containing
> less than 2 vertices), and expanding line/triangle strips into
> individual line segments/triangles.  In Gen6, these operations happen
> after the GS stage, so the code you emit in the geometry shader to do
> transform feedback needs to simulate them.
>
> Back when I first implemented transform feedback for Gen6, supporting
> user-defined geometry shaders was the furthest thing from my mind, so I
> implemented the transform feedback code in raw assembly rather than
> re-using the vec4_visitor/vec4_generator infrastructure (see
> gen6_sol_program()).  Unfortunately that's going to make it hard to
> re-use that code in user-defined geometry shaders.  Also, when I
> implemented that code, I was able to make some major simplifying
> assumptions because there was no user-defined geometry shader--that
> meant that the number of vertices to process was known at compile time,
> and I didn't have to worry about dangling vertex removal or expanding
> line/triangle strips into individual line segments/triangles (because
> the hardware automatically performs those operations before invoking the
> GS).  You're not going to have that luxury.  My recommendation is: feel
> free to use gen6_sol_program() as a reference point, but don't bother
> trying to re-use it--the simplifying assumptions are just going to make
> it too different from what you want.  I'd recommend leaving
> gen6_sol_program() around to handle the case where there's no
> user-defined geometry shader, and then you can write fresh code for
> handling transform feedback when there *is* a user-defined geometry shader.
>
> Before you go too far on the implementation, I'd also recommend writing
> some new piglit tests in tests/spec/glsl-1.50/execution/geometry/ to
> verify that dangling vertex removal and strip expansion are handled
> properly for transform feedback data.  If you don't have gs-supporting
> hardware at your disposal to validate those tests against, I (or one of
> the Intel folks) would be happy to validate the tests for you.  Note in
> particular that in order to expand a triangle strip to triangles, you
> have to be careful with the order to make sure that both the provoking
> vertex and the triangle orientation are preserved (it's not necessary to
> follow ARB_provoking_vertex, though; you can just assume the GL default
> of LAST_VERTEX_CONVENTION applies).  For example, if the pre-expansion
> vertex order is ABCDEF, then the only correct post-expansion order is
> ABC,CBD,CDE,EDF.  I'm not sure whether Gen7+ hardware gets this order
> exactly right in all circumstances, but it seems like it should be easy
> to get it right on Gen6, because the order is entirely determined by
> software :)
>
> I'd also recommend adding some piglit tests to make sure that transform
> feedback overflow conditions are handled properly when a geometry shader
> is in use, since overflow handling on Gen6 is (partially) the
> responsibility of the shader code you generate, and you're not going to
> be able to piggy-back on the overflow handling in gen6_sol_program().
I have two old piglit tests for arb geometry shaders for these cases 
I've never gotten around to submit (see attachments).
If you're interested I can rebase them and port them to 1.5 geometry 
shaders.

Fabian
>
> One other note: check out this text from the Gen6 programmer's reference
> manual (page 172 of Vol 2 part 1): "The final contents of Stream Output
> buffers must follow the strict pipeline ordering of vertices. Given this
> ordering requirement, it will be necessary to run the GS stage in a
> single-threaded fashion (Maximum Number of Threads == 1). Otherwise
> concurrent GS threads might append vertices to the output buffer out of
> order."  I believe this actually incorrect for gen6--I think it only
> applies to Gen5 and earlier, and was left in the documentation for Gen6
> by accident.  Here's how I think it actually works in Gen6.  There are
> two modes: either (a) the SVBI (the pointer that indicates where we are
> writing to in the transform feedback buffer) is incremented
> automatically by the hardware when spawning a GS thread, or (b) the SVBI
> is incremented manually by the GS thread when issuing the FF_SYNC
> message.  The choice of mode (a) or (b) is selected by the
> GEN6_GS_SVBI_POSTINCREMENT_ENABLE flag in 3DSTATE_GS.  The existing
> gen6_sol_program() logic uses mode (a).  But that only works because the
> correct increment is known before the GS thread starts executing.
> You're going to have to use mode (b), since the correct increment
> depends on how many vertices the user-provided GS program outputs.
> (Actually, it's even more complicated than that, because it has to
> account for dangling vertex removal and expansion of triangle and line
> strips).  So here's what I think you're going to have to do:
>
> - Ignore the SVBI values that are provided to the GS thread when it is
> spawned (in fact, don't even bother setting the
> GEN6_GS_SVBI_PAYLOAD_ENABLE bit--that way you won't have to waste
> register space for them).  These values are garbage when using mode (b)
> because lots of GS threads may be spawned before the first one gets a
> chance to start writing transform feedback output.
> - Before issuing the FF_SYNC message, compute the number of transform
> feedback vertices you're going to output, based on the pattern of calls
> to EmitVertex() and EndPrimitive().  To compute this correctly you're
> going to have to think about dangling vertex removal and line/triangle
> strip expansion.
> - In emit_thread_end(), when sending the FF_SYNC message, tell the
> hardware how many vertices you're going to output--this determines how
> much it will post-increment the SVBI pointer.
> - In the response to the FF_SYNC message, the hardware tells you the
> value the SVBI pointer had just before it post-incremented it.  This
> tells you where to store the transform feedback output.
> - Now, at the same time you're sending your vertices down the pipeline
> using URB writes, you can also output them to the transform feedback
> buffer using the SVBI pointer you got back from the FF_SYNC message.
> This is where you do the actual dangling vertex removal and
> line/triangle strip expansion.
>
> Note that you only need to do dangling vertex removal and line/triangle
> strip transformation to the transform feedback data.  It's ok (and
> probably easiest) for the data you deliver down the pipeline using URB
> writes to have dangling vertices and unexpanded triangle/line strips.
>
> Of course, I'm working from the same code and docs that you are, so I
> could be mistaken about some of this :)
>
> Once again, thanks for working on this!  I'll try to look over your code
> sometime today or tomorrow and make more comments.
>
> Paul
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arb_geometry_shader4-Draw-more-vertices-than-allowed.patch
Type: text/x-patch
Size: 5397 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140225/ca172145/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arb_geometry_shader4-Test-primitive-assembly.patch
Type: text/x-patch
Size: 21304 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140225/ca172145/attachment-0003.bin>