[Mesa-dev] i965: possible hardware bug affecting multi-stream support

Chris Forbes chrisf at ijw.co.nz
Thu Jun 26 23:17:01 PDT 2014


You're right, that's a bad idea and doesn't work.

On Fri, Jun 27, 2014 at 6:09 PM, Iago Toral <itoral at igalia.com> wrote:
> If by not doing anything you mean not processing or removing the
> ir_emit_vertex instructions for that stream this would have two problems
> at least:
>
> 1) We won't get correct results for GL_PRIMITIVES_GENERATED in that
> stream (it will always be 0). This may be a minor problem.
>
> 2) If that stream is stream 0 and rendering is enabled then we lose
> rendering output, which would be a major problem.
>
> So I think this is not a good thing to do.
>
> Iago
>
> On Fri, 2014-06-27 at 08:08 +1200, Chris Forbes wrote:
>> As an alternative -- we know if we have this scenario at link time --
>> could we perhaps just not do anything in EmitStreamVertex if there are
>> no varyings captured to that stream?
>> On Thu, Jun 26, 2014 at 10:26 PM, Iago Toral <itoral at igalia.com> wrote:
>> > Hello,
>> >
>> > while testing various scenarios for multi-stream support in geometry
>> > shaders I came across one that I think might be a hardware bug, or at
>> > the very least, a hardware limitation that creates a problem to
>> > implement correct behavior according to ARB_transform_feedback3.
>> >
>> > The conflictive scenario is activated with this setup:
>> >   - Enable transform feedback.
>> >   - Do not associate any varyings with one particular stream (let's call
>> > this stream X).
>> >   - Have the GS emit a vertex to stream X.
>> >
>> > ARB_transform_feedback3 clarifies expected behavior in this case:
>> >
>> > "If the set of varyings selected for transform feedback does not include
>> > any belonging to the specified stream, nothing will be recorded when
>> > primitives are emitted to that stream, and the corresponding vertex
>> > count will be zero."
>> >
>> > However, we get two possible outcomes with this setup:
>> >
>> > 1) If the vertex emitted to that stream is not the last vertex emitted
>> > by the GS, then primitive count for that stream is incorrect (returns
>> > 0), but everything else works ok.
>> >
>> > I think this behavior is expected as per the IvyBridge documentation:
>> >
>> > "8.3 Stream Output Function:
>> > ...
>> > If a stream has no SO_DECL state defined (NumEntries is 0), incoming
>> > objects targeting that stream are effectively ignored. As there is no
>> > attempt to perform stream output, overflow detection is neither required
>> > nor performed."
>> >
>> > Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
>> > primitive count in this case. We could still use CL_INVOCATION_COUNT for
>> > stream 0, but that would not fix the problem for other streams.
>> >
>> > 2) If the vertex emitted to that stream is the last vertex emitted by
>> > the GS, then transform feedback does not work for any stream (no values
>> > are recorded in the TF buffers) and primitive queries for all streams
>> > return 0. Rendering is okay though: stream 0 outputs are rendered
>> > properly and outputs from other streams are discarded. This, I think, is
>> > a hardware problem.
>> >
>> > With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
>> > stream X like this:
>> >
>> > Buffer Selects (Stream X) = 0
>> > Num Entries (Stream X) = 0
>> >
>> > that is, that stream writes to no buffers and has no declarations to
>> > write, which is correct.
>> >
>> > Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
>> > if there are no declarations, makes TF and primitive queries work again
>> > for all streams but X, and for stream X, primitive count is ok, but TF
>> > is not (but that is kind of expected since we are not configuring it
>> > properly). More over, if I also force Buffer Selects (Stream X) = N (so
>> > that N is the index of a disabled TF buffer), then TF also works as
>> > expected for Stream X (primitives generated is okay, TF primitives
>> > written is 0, and no TF data for that stream is written).
>> >
>> > It looks like the hardware does not like setups where there are streams
>> > that have 0 varyings to record after all, even less so if the last
>> > vertex we emit is sent to such a stream.
>> >
>> > Based on the above, there is a work around for this but I think it is
>> > pretty ugly so I would like to know other people's thoughts on whether
>> > it is worth implementing. It would involve the following:
>> >
>> > In upload_3dstate_streamout() we make sure we disable all transform
>> > feedback buffers that are not going to record information (currently a
>> > TF buffer is activated as far as the user has called
>> > glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
>> > can know if a buffer is not going to be written by inspecting its
>> > BufferStride: it should be 0 for buffers that won't get written. I think
>> > this is probably good t do in any case.
>> >
>> > Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect
>> > a stream with no varyings bound to it (so num delcs is 0) *and* there
>> > are disabled TF buffers, we silently set num decls for that stream to 1
>> > and set its buffer_mask to write to one of the disabled buffers (it
>> > won't actually write because they are disabled).
>> >
>> > I have a patch for this [1] and seems to fix the problem (although it
>> > only works as far as we have disabled TF buffers available).
>> >
>> > Opinions? Is there any other alternative to work around this issue?
>> >
>> > The problem is particularly annoying because I think it hits a very
>> > likely scenario: an application using stream 0 for rendering only (no
>> > TF) and using other streams to capture TF.
>> >
>> > Iago
>> >
>> > [1] Patch:
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
>> > index d2c3ae3..1450dde 100644
>> > --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
>> > +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
>> > @@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
>> >           max_decls = decls[stream_id];
>> >     }
>> >
>> > +   /* We need to inspect if we have streams for which we don't have any
>> > +    * varyings to record. The hardware does not handle this scenario well
>> > +    * and for TF to work in this case we need to configure such streams to
>> > +    * have at least one decl and write to some disabled buffer.
>> > +    */
>> > +   int disabled_buffer = -1;
>> > +   for (int i = 0; i < 4; i++) {
>> > +      if (linked_xfb_info->BufferStride[i] == 0) {
>> > +         disabled_buffer = i;
>> > +         break;
>> > +      }
>> > +   }
>> > +   if (disabled_buffer >= 0) {
>> > +      for (int i = 0; i < MAX_VERTEX_STREAMS; i++) {
>> > +         if (decls[i] == 0) {
>> > +            decls[i] = 1;
>> > +            buffer_mask[i] = 1 << disabled_buffer;
>> > +         }
>> > +      }
>> > +   }
>> > +
>> >     BEGIN_BATCH(max_decls * 2 + 3);
>> >     OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (max_decls * 2 + 1));
>> >
>> > @@ -250,9 +271,10 @@ upload_3dstate_streamout(struct brw_context *brw, bool active,
>> >          dw1 |= SO_REORDER_TRAILING;
>> >
>> >        for (i = 0; i < 4; i++) {
>> > -        if (xfb_obj->Buffers[i]) {
>> > -           dw1 |= SO_BUFFER_ENABLE(i);
>> > -        }
>> > +         if (xfb_obj->Buffers[i] &&
>> > +             xfb_obj->shader_program->LinkedTransformFeedback.BufferStride[i] > 0) {
>> > +            dw1 |= SO_BUFFER_ENABLE(i);
>> > +         }
>> >        }
>> >
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>


More information about the mesa-dev mailing list