[Mesa-dev] [PATCH 2/2] i965: Disable SOL buffers and decls when not doing transform feedback.

Iago Toral itoral at igalia.com
Wed Jul 2 01:09:49 PDT 2014


On Wed, 2014-07-02 at 08:49 +0200, Iago Toral wrote:
> Hello Kenneth, Steven:
> 
> On Tue, 2014-07-01 at 17:25 -0700, Kenneth Graunke wrote:
> > Previously, we only emitted 3DSTATE_SO_BUFFER and 3DSTATE_SO_DECL_LIST
> > when transform feedback was active.  When it was inactive, we disabled
> > the SOL stage in 3DSTATE_SOL so the other state would be ignored.
> > 
> > In commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5, Iago enabled the SOL
> > stage universally, so we could implement the GL_PRIMITIVES_GENERATED
> > statistics counter.  This caused every Piglit test to trigger assertions
> > in the simulator, and apparently caused GPU hangs for some users.
> > 
> > Apparently, we're supposed to program 3DSTATE_SO_DECL_LIST to zero when
> > output streams are inactive, but we hadn't been doing so.  Now that SOL
> > is on, we need to do that properly.
> 
> If that is the case then I guess the workaround I was trying to use for
> that hardware issue I mentioned in another e-mail may not be any good,
> since it required to configure a non-zero 3DSTATE_SO_DECL_LIST while
> outputting to a disabled stream... :(

My workaround patch works here in IvyBridge and keeps working with your
patch since it only activates when TF is active, but since it relies on
this stuff I think it would be best if someone tests it on other
hardware when it gets reviewed.

> I never get any GPU hangs though...
> 
> > Experimentally, it looks like we also need to program 3DSTATE_SO_BUFFER
> > to zero as well, or else I get many GPU hangs on Haswell.
> > 
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > Cc: Iago Toral Quiroga <itoral at igalia.com>
> > Cc: Chris Forbes <chrisf at ijw.co.nz>
> > Cc: Steven Newbury <steve at snewbury.org.uk>
> > ---
> >  src/mesa/drivers/dri/i965/brw_state.h      |  3 ++-
> >  src/mesa/drivers/dri/i965/gen7_sol_state.c | 34 ++++++++++++++++++++++--------
> >  src/mesa/drivers/dri/i965/gen8_sol_state.c |  2 +-
> >  3 files changed, 28 insertions(+), 11 deletions(-)
> > 
> > Iago,
> > 
> > I noticed that pretty much every Piglit test broke in our simulation
> > environment after your patch to turn on SOL by default.  Something about
> > SO_DECLs not being right.  This patch seems to fix the issue.  Does it
> > look reasonable to you?
> 
> The patch is producing GPU hangs for me in the case where I have a
> geometry shader and no TF, so it seems that it introduces other
> problems. I'll see if I can find the reason for this.

It looks like we have to send at least one decl per stream even if we
are configuring the number of declarations per stream to 0, otherwise we
get this GPU hang on IvyBridge at least.

This seems to contradict clearly the documentation for IvyBridge. From
8.5 3DSTATE_SO_DECL_LIST Command, dw2, bits 7:0, Num Entries[0]:

"It is legal to specify Num Entries = 0 for all four streams
simultaneously. In this case there will be no SO_DECLs included in the
command (only DW 0-2). Note that all Stream to Buffer Selects bits must
be zero in this case (as no streams produce output).

Which is exactly what Kenneth's patch does, but that produces a GPU hang
here.

At the same time, there is this other statement at the beginning of that
same chapter:

"Errata: All 128 decls for all four streams must be included whenever
this command is issued. The “Num Entries [n]” fields still contain the
actual numbers of valid decls."

Not sure if that sentence is the fix to a previous errata or the errata
itself... but in any case, we are never sending all 128 decls but it
looks like at least on IvyBridge we have to send one or we get a GPU
hang.

Kenneth, following the above, this change to your patch fixes the GPU
hang I was running into:

diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 2013406..6d1b93c 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -108,8 +108,10 @@ gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
        *  all zero (no buffers written to) and the corresponding Num Entries
        *  field to zero (no valid SO_DECLs)."
        */
-      BEGIN_BATCH(3);
-      OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (3 - 2));
+      BEGIN_BATCH(5);
+      OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (5 - 2));
+      OUT_BATCH(0);
+      OUT_BATCH(0);
       OUT_BATCH(0);
       OUT_BATCH(0);
       ADVANCE_BATCH();

Steven, can you try Kenneth's patch with this change too?

Iago

> > Steven,
> > 
> > Does this fix your GPU hangs on Ivybridge?
> 
> Steven, can you provide more info on the use case that is triggering
> your GPU hang (specifically the setup for transform feedback and whether
> you are using a geometry shader or just a vertex shader, etc)?
> 
> I have IvyBridge so I can try to reproduce your setup and see if I get
> the GPU hang here too.
> 
> Iago
> 
> > Thanks!
> >  --Ken
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h
> > index c52a977..2c1fc87 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state.h
> > +++ b/src/mesa/drivers/dri/i965/brw_state.h
> > @@ -237,7 +237,8 @@ void gen7_init_vtable_surface_functions(struct brw_context *brw);
> >  
> >  /* gen7_sol_state.c */
> >  void gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
> > -                                      const struct brw_vue_map *vue_map);
> > +                                      const struct brw_vue_map *vue_map,
> > +                                      bool active);
> >  
> >  /* gen8_surface_state.c */
> >  void gen8_init_vtable_surface_functions(struct brw_context *brw);
> > diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > index eccd5a5..e2eb334 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> > @@ -36,7 +36,7 @@
> >  #include "main/transformfeedback.h"
> >  
> >  static void
> > -upload_3dstate_so_buffers(struct brw_context *brw)
> > +upload_3dstate_so_buffers(struct brw_context *brw, bool active)
> >  {
> >     struct gl_context *ctx = &brw->ctx;
> >     /* BRW_NEW_TRANSFORM_FEEDBACK */
> > @@ -56,7 +56,7 @@ upload_3dstate_so_buffers(struct brw_context *brw)
> >        uint32_t start, end;
> >        uint32_t stride;
> >  
> > -      if (!xfb_obj->Buffers[i]) {
> > +      if (!xfb_obj->Buffers[i] || !active) {
> >  	 /* The pitch of 0 in this command indicates that the buffer is
> >  	  * unbound and won't be written to.
> >  	  */
> > @@ -96,10 +96,28 @@ upload_3dstate_so_buffers(struct brw_context *brw)
> >   */
> >  void
> >  gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
> > -                                 const struct brw_vue_map *vue_map)
> > +                                 const struct brw_vue_map *vue_map,
> > +                                 bool active)
> >  {
> > -   struct gl_context *ctx = &brw->ctx;
> > +   if (!active) {
> > +      /* If inactive, disable all SO outputs.
> > +       *
> > +       * From the Ivybridge PRM, Volume 2, Part 1, page 202
> > +       * (3DSTATE_SO_DECL_LIST packet definition), DWord 1, bits 3:0:
> > +       * "Note: For 'inactive' streams, software must program this field to
> > +       *  all zero (no buffers written to) and the corresponding Num Entries
> > +       *  field to zero (no valid SO_DECLs)."
> > +       */
> > +      BEGIN_BATCH(3);
> > +      OUT_BATCH(_3DSTATE_SO_DECL_LIST << 16 | (3 - 2));
> > +      OUT_BATCH(0);
> > +      OUT_BATCH(0);
> > +      ADVANCE_BATCH();
> > +      return;
> > +   }
> > +
> >     /* BRW_NEW_TRANSFORM_FEEDBACK */
> > +   struct gl_context *ctx = &brw->ctx;
> >     struct gl_transform_feedback_object *xfb_obj =
> >        ctx->TransformFeedback.CurrentObject;
> >     const struct gl_transform_feedback_info *linked_xfb_info =
> > @@ -289,11 +307,9 @@ upload_sol_state(struct brw_context *brw)
> >     /* BRW_NEW_TRANSFORM_FEEDBACK */
> >     bool active = _mesa_is_xfb_active_and_unpaused(ctx);
> >  
> > -   if (active) {
> > -      upload_3dstate_so_buffers(brw);
> > -      /* BRW_NEW_VUE_MAP_GEOM_OUT */
> > -      gen7_upload_3dstate_so_decl_list(brw, &brw->vue_map_geom_out);
> > -   }
> > +   upload_3dstate_so_buffers(brw, active);
> > +   /* BRW_NEW_VUE_MAP_GEOM_OUT */
> > +   gen7_upload_3dstate_so_decl_list(brw, &brw->vue_map_geom_out, active);
> >  
> >     /* Finally, set up the SOL stage.  This command must always follow updates to
> >      * the nonpipelined SOL state (3DSTATE_SO_BUFFER, 3DSTATE_SO_DECL_LIST) or
> > diff --git a/src/mesa/drivers/dri/i965/gen8_sol_state.c b/src/mesa/drivers/dri/i965/gen8_sol_state.c
> > index ebcdaf8..ff85bf2 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_sol_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_sol_state.c
> > @@ -157,7 +157,7 @@ upload_sol_state(struct brw_context *brw)
> >     if (active) {
> >        gen8_upload_3dstate_so_buffers(brw);
> >        /* BRW_NEW_VUE_MAP_GEOM_OUT */
> > -      gen7_upload_3dstate_so_decl_list(brw, &brw->vue_map_geom_out);
> > +      gen7_upload_3dstate_so_decl_list(brw, &brw->vue_map_geom_out, active);
> >     }
> >  
> >     gen8_upload_3dstate_streamout(brw, active, &brw->vue_map_geom_out);
> 




More information about the mesa-dev mailing list