[Mesa-dev] i965: possible hardware bug affecting multi-stream support
Iago Toral
itoral at igalia.com
Thu Jun 26 23:09:17 PDT 2014
If by not doing anything you mean not processing or removing the
ir_emit_vertex instructions for that stream this would have two problems
at least:
1) We won't get correct results for GL_PRIMITIVES_GENERATED in that
stream (it will always be 0). This may be a minor problem.
2) If that stream is stream 0 and rendering is enabled then we lose
rendering output, which would be a major problem.
So I think this is not a good thing to do.
Iago
On Fri, 2014-06-27 at 08:08 +1200, Chris Forbes wrote:
> As an alternative -- we know if we have this scenario at link time --
> could we perhaps just not do anything in EmitStreamVertex if there are
> no varyings captured to that stream?
> On Thu, Jun 26, 2014 at 10:26 PM, Iago Toral <itoral at igalia.com> wrote:
> > Hello,
> >
> > while testing various scenarios for multi-stream support in geometry
> > shaders I came across one that I think might be a hardware bug, or at
> > the very least, a hardware limitation that creates a problem to
> > implement correct behavior according to ARB_transform_feedback3.
> >
> > The conflictive scenario is activated with this setup:
> > - Enable transform feedback.
> > - Do not associate any varyings with one particular stream (let's call
> > this stream X).
> > - Have the GS emit a vertex to stream X.
> >
> > ARB_transform_feedback3 clarifies expected behavior in this case:
> >
> > "If the set of varyings selected for transform feedback does not include
> > any belonging to the specified stream, nothing will be recorded when
> > primitives are emitted to that stream, and the corresponding vertex
> > count will be zero."
> >
> > However, we get two possible outcomes with this setup:
> >
> > 1) If the vertex emitted to that stream is not the last vertex emitted
> > by the GS, then primitive count for that stream is incorrect (returns
> > 0), but everything else works ok.
> >
> > I think this behavior is expected as per the IvyBridge documentation:
> >
> > "8.3 Stream Output Function:
> > ...
> > If a stream has no SO_DECL state defined (NumEntries is 0), incoming
> > objects targeting that stream are effectively ignored. As there is no
> > attempt to perform stream output, overflow detection is neither required
> > nor performed."
> >
> > Which means that we can't use SO_PRIMITIVE_STORAGE_NEEDED for the
> > primitive count in this case. We could still use CL_INVOCATION_COUNT for
> > stream 0, but that would not fix the problem for other streams.
> >
> > 2) If the vertex emitted to that stream is the last vertex emitted by
> > the GS, then transform feedback does not work for any stream (no values
> > are recorded in the TF buffers) and primitive queries for all streams
> > return 0. Rendering is okay though: stream 0 outputs are rendered
> > properly and outputs from other streams are discarded. This, I think, is
> > a hardware problem.
> >
> > With this setup, we are configuring the 3DSTATE_SO_DECL_LIST command for
> > stream X like this:
> >
> > Buffer Selects (Stream X) = 0
> > Num Entries (Stream X) = 0
> >
> > that is, that stream writes to no buffers and has no declarations to
> > write, which is correct.
> >
> > Now comes the funny part: simply forcing Num Entries(Stream X) = 1, even
> > if there are no declarations, makes TF and primitive queries work again
> > for all streams but X, and for stream X, primitive count is ok, but TF
> > is not (but that is kind of expected since we are not configuring it
> > properly). More over, if I also force Buffer Selects (Stream X) = N (so
> > that N is the index of a disabled TF buffer), then TF also works as
> > expected for Stream X (primitives generated is okay, TF primitives
> > written is 0, and no TF data for that stream is written).
> >
> > It looks like the hardware does not like setups where there are streams
> > that have 0 varyings to record after all, even less so if the last
> > vertex we emit is sent to such a stream.
> >
> > Based on the above, there is a work around for this but I think it is
> > pretty ugly so I would like to know other people's thoughts on whether
> > it is worth implementing. It would involve the following:
> >
> > In upload_3dstate_streamout() we make sure we disable all transform
> > feedback buffers that are not going to record information (currently a
> > TF buffer is activated as far as the user has called
> > glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, index bufferName)). We
> > can know if a buffer is not going to be written by inspecting its
> > BufferStride: it should be 0 for buffers that won't get written. I think
> > this is probably good t do in any case.
> >
> > Then the ugly part: in gen7_upload_3dstate_so_decl_list(), if we detect
> > a stream with no varyings bound to it (so num delcs is 0) *and* there
> > are disabled TF buffers, we silently set num decls for that stream to 1
> > and set its buffer_mask to write to one of the disabled buffers (it
> > won't actually write because they are disabled).
> >
> > I have a patch for this [1] and seems to fix the problem (although it
> > only works as far as we have disabled TF buffers available).
> >
> > Opinions? Is there any other alternative to work around this issue?
> >
> > The problem is particularly annoying because I think it hits a very
> > likely scenario: an application using stream 0 for rendering only (no
> > TF) and using other streams to capture TF.
> >
> > Iago
> >
> > [1] Patch:
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > index d2c3ae3..1450dde 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > @@ -189,6 +189,27 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
> > max_decls = decls[stream_id];
> > }
> >
> > + /* We need to inspect if we have streams for which we don't have any
> > + * varyings to record. The hardware does not handle this scenario well
> > + * and for TF to work in this case we need to configure such streams to
> > + * have at least one decl and write to some disabled buffer.
> > + */
> > + int disabled_buffer = -1;
> > + for (int i = 0; i < 4; i++) {
> > + if (linked_xfb_info->BufferStride[i] == 0) {
> > + disabled_buffer = i;
> > + break;
> > + }
> > + }
> > + if (disabled_buffer >= 0) {
> > + for (int i = 0; i < MAX_VERTEX_STREAMS; i++) {
> > + if (decls[i] == 0) {
> > + decls[i] = 1;
> > + buffer_mask[i] = 1 << disabled_buffer;
> > + }
> > + }
> > + }
> > +
> > BEGIN_BATCH(max_decls * 2 + 3);
> > OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (max_decls * 2 + 1));
> >
> > @@ -250,9 +271,10 @@ upload_3dstate_streamout(struct brw_context *brw, bool active,
> > dw1 |= SO_REORDER_TRAILING;
> >
> > for (i = 0; i < 4; i++) {
> > - if (xfb_obj->Buffers[i]) {
> > - dw1 |= SO_BUFFER_ENABLE(i);
> > - }
> > + if (xfb_obj->Buffers[i] &&
> > + xfb_obj->shader_program->LinkedTransformFeedback.BufferStride[i] > 0) {
> > + dw1 |= SO_BUFFER_ENABLE(i);
> > + }
> > }
> >
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
More information about the mesa-dev
mailing list