<div dir="ltr">On 28 August 2013 17:38, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">On 08/26/2013 03:12 PM, Paul Berry wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
The hardware requires that after constant buffers for a stage are<br>
allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{<u></u>VS,HS,DS,GS,PS}<br>
command, and prior to execution of a 3DPRIMITIVE, the corresponding<br>
stage's constant buffers must be reprogrammed using a<br>
3DSTATE_CONSTANT_{VS,HS,DS,GS,<u></u>PS} command.<br>
<br>
Previously we didn't need to worry about this, because we only<br>
programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{<u></u>VS,HS,DS,GS,PS} once on<br>
startup.  But now that we reallocate the constant buffers whenever<br>
geometry shaders are switched on and off, we need to make sure the<br>
constant buffers are reprogrammed.<br>
</blockquote>
<br></div>
Not exactly.  The change to do PUSH_CONSTANT_ALLOC once at startup is very recent - I only committed it on June 10th (fc800f0c60a2) Previously, we had a state atom which did PUSH_CONSTANT_ALLOC whenever BRW_NEW_CONTEXT was flagged.<br>

<br>
That's still vaguely once at startup, but keep in mind that before hardware contexts were mandatory, BRW_NEW_CONTEXT got flagged on every batch.<br></blockquote><div><br></div><div>Ok, I've added a parenthetical comment to the commit message to clarify this.  It now says:<br>
<br>Previously we didn't need to worry about this, because we only<br>programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on<br>startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged).<br></div><div>
 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
The atoms list looked like this:<br>
<br>
   &gen7_push_constant_alloc,<br>
   ...<br>
   &gen7_vs_state,<br>
   ...<br>
   &gen7_ps_state,<br>
<br>
Both VS and PS state listen to BRW_NEW_BATCH, so on every batch, we'd end up doing:<br>
<br>
3DSTATE_PUSH_CONSTANT_ALLOC_VS (if hw_ctx == NULL)<br>
3DSTATE_PUSH_CONSTANT_ALLOC_PS (if hw_ctx == NULL)<br>
3DSTATE_CONSTANT_VS<br>
3DSTATE_CONSTANT_PS<br>
<br>
which meant that we always obeyed this rule, even when we didn't do the allocation once at startup and never again.<br>
<br>
But this only worked because we always allocated push constant space at the start of a batch.  Your previous patch cause reallocations to happen mid-batch whenever the geometry program changes.  This makes the old tricks quit working, and we do need a new flag.<br>

<br>
So, I was pretty skeptical of this patch, but on further review, it does appear to be necessary and looks fine as is.<div class=""><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION, to<br>
brw->state.dirty.brw.<br>
---<br>
  src/mesa/drivers/dri/i965/brw_<u></u>context.h   |  2 ++<br>
  src/mesa/drivers/dri/i965/<u></u>gen6_gs_state.c |  2 +-<br>
  src/mesa/drivers/dri/i965/<u></u>gen6_vs_state.c |  3 ++-<br>
  src/mesa/drivers/dri/i965/<u></u>gen6_wm_state.c |  3 ++-<br>
  src/mesa/drivers/dri/i965/<u></u>gen7_urb.c      | 13 +++++++++++++<br>
  src/mesa/drivers/dri/i965/<u></u>gen7_vs_state.c |  3 ++-<br>
  src/mesa/drivers/dri/i965/<u></u>gen7_wm_state.c |  3 ++-<br>
  7 files changed, 24 insertions(+), 5 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>brw_context.h b/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
index 95f9bb2..35193a6 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
@@ -158,6 +158,7 @@ enum brw_state_id {<br>
     BRW_STATE_UNIFORM_BUFFER,<br>
     BRW_STATE_META_IN_PROGRESS,<br>
     BRW_STATE_INTERPOLATION_MAP,<br>
+   BRW_STATE_PUSH_CONSTANT_<u></u>ALLOCATION,<br>
     BRW_NUM_STATE_BITS<br>
  };<br>
<br>
@@ -194,6 +195,7 @@ enum brw_state_id {<br>
  #define BRW_NEW_UNIFORM_BUFFER          (1 << BRW_STATE_UNIFORM_BUFFER)<br>
  #define BRW_NEW_META_IN_PROGRESS        (1 << BRW_STATE_META_IN_PROGRESS)<br>
  #define BRW_NEW_INTERPOLATION_MAP       (1 << BRW_STATE_INTERPOLATION_MAP)<br>
+#define BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION (1 << BRW_STATE_PUSH_CONSTANT_<u></u>ALLOCATION)<br>
<br>
  struct brw_state_flags {<br>
     /** State update flags signalled by mesa internals */<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen6_gs_state.c b/src/mesa/drivers/dri/i965/<u></u>gen6_gs_state.c<br>
index ac78286..9648fb7 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen6_gs_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen6_gs_state.c<br>
@@ -81,7 +81,7 @@ upload_gs_state(struct brw_context *brw)<br>
  const struct brw_tracked_state gen6_gs_state = {<br>
     .dirty = {<br>
        .mesa  = _NEW_TRANSFORM,<br>
-      .brw   = BRW_NEW_CONTEXT,<br>
+      .brw   = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION,<br>
        .cache = CACHE_NEW_FF_GS_PROG<br>
     },<br>
     .emit = upload_gs_state,<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen6_vs_state.c b/src/mesa/drivers/dri/i965/<u></u>gen6_vs_state.c<br>
index c099342..9f99db8 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen6_vs_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen6_vs_state.c<br>
@@ -206,7 +206,8 @@ const struct brw_tracked_state gen6_vs_state = {<br>
        .mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,<br>
        .brw   = (BRW_NEW_CONTEXT |<br>
                BRW_NEW_VERTEX_PROGRAM |<br>
-               BRW_NEW_BATCH),<br>
+               BRW_NEW_BATCH |<br>
+                BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION),<br>
        .cache = CACHE_NEW_VS_PROG | CACHE_NEW_SAMPLER<br>
     },<br>
     .emit = upload_vs_state,<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen6_wm_state.c b/src/mesa/drivers/dri/i965/<u></u>gen6_wm_state.c<br>
index e286785..6725805 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen6_wm_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen6_wm_state.c<br>
@@ -229,7 +229,8 @@ const struct brw_tracked_state gen6_wm_state = {<br>
                _NEW_POLYGON |<br>
                  _NEW_MULTISAMPLE),<br>
        .brw   = (BRW_NEW_FRAGMENT_PROGRAM |<br>
-               BRW_NEW_BATCH),<br>
+               BRW_NEW_BATCH |<br>
+                BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION),<br>
        .cache = (CACHE_NEW_SAMPLER |<br>
                CACHE_NEW_WM_PROG)<br>
     },<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c b/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
index 4dc8f6e..1bca9cd 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
@@ -81,6 +81,19 @@ gen7_allocate_push_constants(<u></u>struct brw_context *brw)<br>
<br>
     gen7_emit_push_constant_state(<u></u>brw, multiplier * vs_size,<br>
                                   multiplier * gs_size, multiplier * fs_size);<br>
+<br>
+   /* From p115 of the Ivy Bridge PRM (3.2.1.4 3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>VS):<br>
+    *<br>
+    *     Programming Restriction:<br>
+    *<br>
+    *     The 3DSTATE_CONSTANT_VS must be reprogrammed prior to the next<br>
+    *     3DPRIMITIVE command after programming the<br>
+    *     3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>VS.<br>
+    *<br>
+    * Similar text exists for the other 3DSTATE_PUSH_CONSTANT_ALLOC_*<br>
+    * commands.<br>
+    */<br>
+   brw->state.dirty.brw |= BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION;<br>
  }<br>
<br>
  void<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen7_vs_state.c b/src/mesa/drivers/dri/i965/<u></u>gen7_vs_state.c<br>
index 36ab229..36fccf7 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen7_vs_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen7_vs_state.c<br>
@@ -116,7 +116,8 @@ const struct brw_tracked_state gen7_vs_state = {<br>
        .brw   = (BRW_NEW_CONTEXT |<br>
                BRW_NEW_VERTEX_PROGRAM |<br>
                BRW_NEW_VS_BINDING_TABLE |<br>
-               BRW_NEW_BATCH),<br>
+               BRW_NEW_BATCH |<br>
+                BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION),<br>
        .cache = CACHE_NEW_VS_PROG | CACHE_NEW_SAMPLER<br>
     },<br>
     .emit = upload_vs_state,<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen7_wm_state.c b/src/mesa/drivers/dri/i965/<u></u>gen7_wm_state.c<br>
index e88db78..e5691fb 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen7_wm_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen7_wm_state.c<br>
@@ -227,7 +227,8 @@ const struct brw_tracked_state gen7_ps_state = {<br>
                _NEW_COLOR),<br>
        .brw   = (BRW_NEW_FRAGMENT_PROGRAM |<br>
                BRW_NEW_PS_BINDING_TABLE |<br>
-               BRW_NEW_BATCH),<br>
+               BRW_NEW_BATCH |<br>
+                BRW_NEW_PUSH_CONSTANT_<u></u>ALLOCATION),<br>
        .cache = (CACHE_NEW_SAMPLER |<br>
                CACHE_NEW_WM_PROG)<br>
     },<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div></div>