[Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

Kenneth Graunke kenneth at whitecape.org
Fri Oct 25 22:13:11 CEST 2013

On 10/21/2013 05:55 PM, Eric Anholt wrote:
> This interface means synchronizing with the GPU, which sucks when we
> have the ability to actually do DTFB in the hardware pipeline (Indirect
> Parameter Enable of 3DPRIMITIVE).

It's not that simple.

The 3DPRIMITIVE indirect registers require you to specify a vertex count
(which should be the number of vertices actually written to the SO
buffer, which may be less than you asked for due to overflow).

As far as I can tell, the Gen7 SOL stage has no mechanism to give you
the number of vertices written to the SOL buffer.  There is
SO_NUM_PRIMS_WRITTEN(0-3), which gives you the number of primitives
actually written.

For POINTS, this works since each primitive is a single vertex.  But for
LINES and TRIANGLES, you need to multiply this count by 2 or 3 vertices
per primitive.

Haswell has an MI_MATH command which might be usable for this.  But on
Ivybridge, I don't know how to do this other than writing a shader
program that reads from the buffer, does the multiplication, and writes
it back out (and draw a single point).  Then MI_LOAD_REGISTER_MEM it
into the indirect vertex count register.  That might work, but is it better?

The other complexity is PauseTransformFeedback and switching.  The
vertex count is the # of verts actually written between Begin/End on a
single object.  If you have two objects, you might do:

Begin A, draw, Pause A, Begin B, draw, End B, Resume A, draw, End A.

But there is only one SO_NUM_PRIMS_WRITTEN register, which is intended
to be free running.  If you leave it free running, you need to take
snapshots at Begin/End/Pause/Resume and subtract deltas to get the
actual number of primitives written...then do the multiplication above.

We could violate the free running assumption and set
SO_NUM_PRIMS_WRITTEN to 0 on Begin, and save/restore it on Pause/Resume.
 Then the value at End would be the final value, and we wouldn't have to
deal with deltas, which would be simpler.  I'm open to trying that if
people would prefer it.

Maybe I am fundamentally missing something here, but it seems far from
obvious to me how to use draw indirect to do this properly.  On
Ivybridge doing it on the GPU sounds very complex and heavyweight.
Haswell could probably do it if we adopt the save/restore approach.

> We could mostly use the hw pipelined
> version only, as long as we had core contexts (meaning that we don't
> need vertex start/count to figure out how much user vertex array data to
> upload).

Right, so we'd need this for that case, anyway.

> But, given that we have sw primitive restart on some lame hardware that
> we want to support this on, we've got to have this path anyway.

Where by "lame hardware" you mean Ivybridge.


More information about the mesa-dev mailing list