[Mesa-dev] i965: possible hardware bug affecting multi-stream support

Chris Forbes chrisf at ijw.co.nz
Thu Jun 26 13:08:45 PDT 2014


As an alternative -- we know if we have this scenario at link time --
could we perhaps just not do anything in EmitStreamVertex if there are
no varyings captured to that stream?

On Thu, Jun 26, 2014 at 10:26 PM, Iago Toral <itoral at igalia.com> wrote:
> Hello,
>
> while testing various scenarios for multi-stream support in geometry
> shaders I came across one that I think might be a hardware bug, or at
> the very least, a hardware limitation that creates a problem to
> implement correct behavior according to ARB_transform_feedback3.
>
> The conflictive scenario is activated with this setup:
>   - Enable transform feedback.
>   - Do not associate any varyings with one particular stream (let's call
> this stream X).
>   - Have the GS emit a vertex to stream X.
>
> ARB_transform_feedback3 clarifies expected behavior in this case:
>
> "If the set of varyings selected for transform feedback does not include
> any belonging to the specified stream, nothing will be recorded when
> primitives are emitted to that stream, and the corresponding vertex
> count will be zero."
>
> However, we get two possible outcomes with this setup:
>
> 1) If the vertex emitted to that stream is not the last vertex emitted
> by the GS, then primitive count for that stream is incorrect (returns
> 0), but everything else works ok.
>
> I think this behavior is expected as per the IvyBridge documentation:
>
> "8.3 Stream Output Function:
> ...
> If a stream has no SO_DECL state defined (NumEntries is 0), incoming
> objects targeting that stream are effectively ignored. As there is no
> attempt to perform stream output, overflow detection is neither required
> nor performed."
>
> Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
> primitive count in this case. We could still use CL_INVOCATION_COUNT for
> stream 0, but that would not fix the problem for other streams.
>
> 2) If the vertex emitted to that stream is the last vertex emitted by
> the GS, then transform feedback does not work for any stream (no values
> are recorded in the TF buffers) and primitive queries for all streams
> return 0. Rendering is okay though: stream 0 outputs are rendered
> properly and outputs from other streams are discarded. This, I think, is
> a hardware problem.
>
> With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
> stream X like this:
>
> Buffer Selects (Stream X) = 0
> Num Entries (Stream X) = 0
>
> that is, that stream writes to no buffers and has no declarations to
> write, which is correct.
>
> Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
> if there are no declarations, makes TF and primitive queries work again
> for all streams but X, and for stream X, primitive count is ok, but TF
> is not (but that is kind of expected since we are not configuring it
> properly). More over, if I also force Buffer Selects (Stream X) = N (so
> that N is the index of a disabled TF buffer), then TF also works as
> expected for Stream X (primitives generated is okay, TF primitives
> written is 0, and no TF data for that stream is written).
>
> It looks like the hardware does not like setups where there are streams
> that have 0 varyings to record after all, even less so if the last
> vertex we emit is sent to such a stream.
>
> Based on the above, there is a work around for this but I think it is
> pretty ugly so I would like to know other people's thoughts on whether
> it is worth implementing. It would involve the following:
>
> In upload_3dstate_streamout() we make sure we disable all transform
> feedback buffers that are not going to record information (currently a
> TF buffer is activated as far as the user has called
> glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
> can know if a buffer is not going to be written by inspecting its
> BufferStride: it should be 0 for buffers that won't get written. I think
> this is probably good t do in any case.
>
> Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect
> a stream with no varyings bound to it (so num delcs is 0) *and* there
> are disabled TF buffers, we silently set num decls for that stream to 1
> and set its buffer_mask to write to one of the disabled buffers (it
> won't actually write because they are disabled).
>
> I have a patch for this [1] and seems to fix the problem (although it
> only works as far as we have disabled TF buffers available).
>
> Opinions? Is there any other alternative to work around this issue?
>
> The problem is particularly annoying because I think it hits a very
> likely scenario: an application using stream 0 for rendering only (no
> TF) and using other streams to capture TF.
>
> Iago
>
> [1] Patch:
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> index d2c3ae3..1450dde 100644
> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> @@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
>           max_decls = decls[stream_id];
>     }
>
> +   /* We need to inspect if we have streams for which we don't have any
> +    * varyings to record. The hardware does not handle this scenario well
> +    * and for TF to work in this case we need to configure such streams to
> +    * have at least one decl and write to some disabled buffer.
> +    */
> +   int disabled_buffer = -1;
> +   for (int i = 0; i < 4; i++) {
> +      if (linked_xfb_info->BufferStride[i] == 0) {
> +         disabled_buffer = i;
> +         break;
> +      }
> +   }
> +   if (disabled_buffer >= 0) {
> +      for (int i = 0; i < MAX_VERTEX_STREAMS; i++) {
> +         if (decls[i] == 0) {
> +            decls[i] = 1;
> +            buffer_mask[i] = 1 << disabled_buffer;
> +         }
> +      }
> +   }
> +
>     BEGIN_BATCH(max_decls * 2 + 3);
>     OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (max_decls * 2 + 1));
>
> @@ -250,9 +271,10 @@ upload_3dstate_streamout(struct brw_context *brw, bool active,
>          dw1 |= SO_REORDER_TRAILING;
>
>        for (i = 0; i < 4; i++) {
> -        if (xfb_obj->Buffers[i]) {
> -           dw1 |= SO_BUFFER_ENABLE(i);
> -        }
> +         if (xfb_obj->Buffers[i] &&
> +             xfb_obj->shader_program->LinkedTransformFeedback.BufferStride[i] > 0) {
> +            dw1 |= SO_BUFFER_ENABLE(i);
> +         }
>        }
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list