[Mesa-dev] i965: possible hardware bug affecting multi-stream support

Iago Toral itoral at igalia.com
Thu Jun 26 03:26:23 PDT 2014


while testing various scenarios for multi-stream support in geometry
shaders I came across one that I think might be a hardware bug, or at
the very least, a hardware limitation that creates a problem to
implement correct behavior according to ARB_transform_feedback3.

The conflictive scenario is activated with this setup:
  - Enable transform feedback.
  - Do not associate any varyings with one particular stream (let's call
this stream X).
  - Have the GS emit a vertex to stream X.

ARB_transform_feedback3 clarifies expected behavior in this case:

"If the set of varyings selected for transform feedback does not include
any belonging to the specified stream, nothing will be recorded when
primitives are emitted to that stream, and the corresponding vertex
count will be zero."

However, we get two possible outcomes with this setup:

1) If the vertex emitted to that stream is not the last vertex emitted
by the GS, then primitive count for that stream is incorrect (returns
0), but everything else works ok.

I think this behavior is expected as per the IvyBridge documentation:

"8.3 Stream Output Function:
If a stream has no SO_DECL state defined (NumEntries is 0), incoming
objects targeting that stream are effectively ignored. As there is no
attempt to perform stream output, overflow detection is neither required
nor performed."

Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
primitive count in this case. We could still use CL_INVOCATION_COUNT for
stream 0, but that would not fix the problem for other streams.

2) If the vertex emitted to that stream is the last vertex emitted by
the GS, then transform feedback does not work for any stream (no values
are recorded in the TF buffers) and primitive queries for all streams
return 0. Rendering is okay though: stream 0 outputs are rendered
properly and outputs from other streams are discarded. This, I think, is
a hardware problem.

With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
stream X like this:

Buffer Selects (Stream X) = 0
Num Entries (Stream X) = 0

that is, that stream writes to no buffers and has no declarations to
write, which is correct.

Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
if there are no declarations, makes TF and primitive queries work again
for all streams but X, and for stream X, primitive count is ok, but TF
is not (but that is kind of expected since we are not configuring it
properly). More over, if I also force Buffer Selects (Stream X) = N (so
that N is the index of a disabled TF buffer), then TF also works as
expected for Stream X (primitives generated is okay, TF primitives
written is 0, and no TF data for that stream is written).

It looks like the hardware does not like setups where there are streams
that have 0 varyings to record after all, even less so if the last
vertex we emit is sent to such a stream.

Based on the above, there is a work around for this but I think it is
pretty ugly so I would like to know other people's thoughts on whether
it is worth implementing. It would involve the following:

In upload_3dstate_streamout() we make sure we disable all transform
feedback buffers that are not going to record information (currently a
TF buffer is activated as far as the user has called
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
can know if a buffer is not going to be written by inspecting its
BufferStride: it should be 0 for buffers that won't get written. I think
this is probably good t do in any case.

Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect
a stream with no varyings bound to it (so num delcs is 0) *and* there
are disabled TF buffers, we silently set num decls for that stream to 1
and set its buffer_mask to write to one of the disabled buffers (it
won't actually write because they are disabled).

I have a patch for this [1] and seems to fix the problem (although it
only works as far as we have disabled TF buffers available).

Opinions? Is there any other alternative to work around this issue?

The problem is particularly annoying because I think it hits a very
likely scenario: an application using stream 0 for rendering only (no
TF) and using other streams to capture TF.


[1] Patch:

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index d2c3ae3..1450dde 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
          max_decls = decls[stream_id];
+   /* We need to inspect if we have streams for which we don't have any
+    * varyings to record. The hardware does not handle this scenario well
+    * and for TF to work in this case we need to configure such streams to
+    * have at least one decl and write to some disabled buffer.
+    */
+   int disabled_buffer = -1;
+   for (int i = 0; i < 4; i++) {
+      if (linked_xfb_info->BufferStride[i] == 0) {
+         disabled_buffer = i;
+         break;
+      }
+   }
+   if (disabled_buffer >= 0) {
+      for (int i = 0; i < MAX_VERTEX_STREAMS; i++) {
+         if (decls[i] == 0) {
+            decls[i] = 1;
+            buffer_mask[i] = 1 << disabled_buffer;
+         }
+      }
+   }
    BEGIN_BATCH(max_decls * 2 + 3);
    OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (max_decls * 2 + 1));
@@ -250,9 +271,10 @@ upload_3dstate_streamout(struct brw_context *brw, bool active,
         dw1 |= SO_REORDER_TRAILING;
       for (i = 0; i < 4; i++) {
-        if (xfb_obj->Buffers[i]) {
-           dw1 |= SO_BUFFER_ENABLE(i);
-        }
+         if (xfb_obj->Buffers[i] && 
+             xfb_obj->shader_program->LinkedTransformFeedback.BufferStride[i] > 0) {
+            dw1 |= SO_BUFFER_ENABLE(i);
+         }

More information about the mesa-dev mailing list