<div dir="ltr">On 28 August 2013 17:58, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">On 08/26/2013 03:12 PM, Paul Berry wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Previously, we would always use the same push constant allocation<br>
regardless of what shader programs were being run: the available push<br>
constant space was split into 2 equal size partitions, one for the<br>
vertex shader, and one for the fragment shader.<br>
<br>
Now that we are adding geometry shader support, we need to do<br>
something smarter. This patch adjusts things so that when a geometry<br>
shader is in use, we split the available push constant space into 3<br>
nearly-equal size partitions instead of 2.<br>
<br>
Since the push constant allocation is now affected by GL state, it can<br>
no longer be set up by brw_upload_initial_gpu_state()<u></u>; instead it must<br>
be set up by a state atom.<br>
---<br>
src/mesa/drivers/dri/i965/brw_<u></u>context.h | 3 +-<br>
src/mesa/drivers/dri/i965/brw_<u></u>defines.h | 1 +<br>
src/mesa/drivers/dri/i965/brw_<u></u>state.h | 4 +-<br>
src/mesa/drivers/dri/i965/brw_<u></u>state_upload.c | 5 +-<br>
src/mesa/drivers/dri/i965/<u></u>gen7_blorp.cpp | 6 ++<br>
src/mesa/drivers/dri/i965/<u></u>gen7_urb.c | 101 +++++++++++++++++++++++----<br>
6 files changed, 98 insertions(+), 22 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>brw_context.h b/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
index 77f2a6b..95f9bb2 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>brw_context.h<br>
@@ -1508,7 +1508,8 @@ gen6_get_sample_position(<u></u>struct gl_context *ctx,<br>
<br>
/* gen7_urb.c */<br>
void<br>
-gen7_allocate_push_constants(<u></u>struct brw_context *brw);<br>
+gen7_emit_push_constant_<u></u>state(struct brw_context *brw, unsigned vs_size,<br>
+ unsigned gs_size, unsigned fs_size);<br>
<br>
void<br>
gen7_emit_urb_state(struct brw_context *brw,<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>brw_defines.h b/src/mesa/drivers/dri/i965/<u></u>brw_defines.h<br>
index 832ff55..8d9a824 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>brw_defines.h<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>brw_defines.h<br>
@@ -1284,6 +1284,7 @@ enum brw_message_target {<br>
# define GEN7_URB_STARTING_ADDRESS_<u></u>SHIFT 25<br>
<br>
#define _3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>VS 0x7912 /* GEN7+ */<br>
+#define _3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>GS 0x7915 /* GEN7+ */<br>
#define _3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>PS 0x7916 /* GEN7+ */<br>
# define GEN7_PUSH_CONSTANT_BUFFER_<u></u>OFFSET_SHIFT 16<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>brw_state.h b/src/mesa/drivers/dri/i965/<u></u>brw_state.h<br>
index 85f82fe..4814639 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>brw_state.h<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>brw_state.h<br>
@@ -112,6 +112,7 @@ extern const struct brw_tracked_state gen7_cc_viewport_state_<u></u>pointer;<br>
extern const struct brw_tracked_state gen7_clip_state;<br>
extern const struct brw_tracked_state gen7_disable_stages;<br>
extern const struct brw_tracked_state gen7_ps_state;<br>
+extern const struct brw_tracked_state gen7_push_constant_space;<br>
extern const struct brw_tracked_state gen7_sbe_state;<br>
extern const struct brw_tracked_state gen7_sf_clip_viewport;<br>
extern const struct brw_tracked_state gen7_sf_state;<br>
@@ -220,9 +221,6 @@ uint32_t<br>
get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset,<br>
int fs_attr, bool two_side_color, uint32_t *max_source_attr);<br>
<br>
-/* gen7_urb.c */<br>
-void gen7_allocate_push_constants(<u></u>struct brw_context *brw);<br>
-<br>
#ifdef __cplusplus<br>
}<br>
#endif<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>brw_state_upload.c b/src/mesa/drivers/dri/i965/<u></u>brw_state_upload.c<br>
index b883002..9638c69 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>brw_state_upload.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>brw_state_upload.c<br>
@@ -188,6 +188,7 @@ static const struct brw_tracked_state *gen7_atoms[] =<br>
&gen7_cc_viewport_state_<u></u>pointer, /* must do after brw_cc_vp */<br>
&gen7_sf_clip_viewport,<br>
<br>
+ &gen7_push_constant_space,<br>
&gen7_urb,<br>
&gen6_blend_state, /* must do before cc unit */<br>
&gen6_color_calc_state, /* must do before cc unit */<br>
@@ -251,10 +252,6 @@ brw_upload_initial_gpu_state(<u></u>struct brw_context *brw)<br>
return;<br>
<br>
brw_upload_invariant_state(<u></u>brw);<br>
-<br>
- if (brw->gen >= 7) {<br>
- gen7_allocate_push_constants(<u></u>brw);<br>
- }<br>
}<br>
<br>
void brw_init_state( struct brw_context *brw )<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen7_blorp.cpp b/src/mesa/drivers/dri/i965/<u></u>gen7_blorp.cpp<br>
index 6c798b1..9df3d92 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen7_blorp.cpp<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen7_blorp.cpp<br>
@@ -51,6 +51,12 @@ static void<br>
gen7_blorp_emit_urb_config(<u></u>struct brw_context *brw,<br>
const brw_blorp_params *params)<br>
{<br>
+ unsigned urb_size = (brw->is_haswell && brw->gt == 3) ? 32 : 16;<br>
+ gen7_emit_push_constant_state(<u></u>brw,<br>
+ urb_size / 2 /* vs_size */,<br>
+ 0 /* gs_size */,<br>
+ urb_size / 2 /* fs_size */);<br>
+<br>
/* The minimum valid number of VS entries is 32. See 3DSTATE_URB_VS, Dword<br>
* 1.15:0 "VS Number of URB Entries".<br>
*/<br>
diff --git a/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c b/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
index 2d10cc12..4dc8f6e 100644<br>
--- a/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
+++ b/src/mesa/drivers/dri/i965/<u></u>gen7_urb.c<br>
@@ -30,12 +30,12 @@<br>
/**<br>
* The following diagram shows how we partition the URB:<br>
*<br>
- * 8kB 8kB Rest of the URB space<br>
- * ____-____ ____-____ _________________-____________<u></u>_____<br>
- * / \ / \ / \<br>
+ * 16kB or 32kB Rest of the URB space<br>
+ * __________-__________ _________________-____________<u></u>_____<br>
+ * / \ / \<br>
* +-----------------------------<u></u>------------------------------<u></u>--+<br>
- * | VS Push | FS Push | VS |<br>
- * | Constants | Constants | Handles |<br>
+ * | VS/FS/GS Push | VS/GS URB |<br>
+ * | Constants | Entries |<br>
* +-----------------------------<u></u>------------------------------<u></u>--+<br>
*<br>
* Notably, push constants must be stored at the beginning of the URB<br>
@@ -43,8 +43,12 @@<br>
* GT1/GT2 have a maximum constant buffer size of 16kB, while Haswell GT3<br>
* doubles this (32kB).<br>
*<br>
- * Currently we split the constant buffer space evenly between VS and FS.<br>
- * This is probably not ideal, but simple.<br>
+ * Ivybridge and Haswell GT1/GT2 allow push constants to be located (and<br>
+ * sized) in increments of 1kB. Haswell GT3 requires them to be located and<br>
+ * sized in increments of 2kB.<br>
+ *<br>
+ * Currently we split the constant buffer space evenly among whatever stages<br>
+ * are active. This is probably not ideal, but simple.<br>
*<br>
* Ivybridge GT1 and Haswell GT1 have 128kB of URB space.<br>
* Ivybridge GT2 and Haswell GT2 have 256kB of URB space.<br>
@@ -53,22 +57,91 @@<br>
* See "Volume 2a: 3D Pipeline," section 1.8, "Volume 1b: Configurations",<br>
* and the documentation for 3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>xS.<br>
*/<br>
-void<br>
+static void<br>
gen7_allocate_push_constants(<u></u>struct brw_context *brw)<br>
{<br>
- unsigned size = 8;<br>
- if (brw->is_haswell && brw->gt == 3)<br>
- size = 16;<br>
+ unsigned avail_size = 16;<br>
+ unsigned multiplier = (brw->is_haswell && brw->gt == 3) ? 2 : 1;<br>
+<br>
+ /* BRW_NEW_GEOMETRY_PROGRAM */<br>
+ bool gs_present = brw->geometry_program;<br>
+<br>
+ unsigned vs_size, gs_size;<br>
+ if (gs_present) {<br>
+ vs_size = avail_size / 3;<br>
+ avail_size -= vs_size;<br>
+ gs_size = avail_size / 2;<br>
+ avail_size -= gs_size;<br>
+ } else {<br>
+ vs_size = avail_size / 2;<br>
+ avail_size -= vs_size;<br>
+ gs_size = 0;<br>
+ }<br>
+ unsigned fs_size = avail_size;<br>
+<br>
+ gen7_emit_push_constant_state(<u></u>brw, multiplier * vs_size,<br>
+ multiplier * gs_size, multiplier * fs_size);<br>
+}<br>
+<br>
+void<br>
+gen7_emit_push_constant_<u></u>state(struct brw_context *brw, unsigned vs_size,<br>
+ unsigned gs_size, unsigned fs_size)<br>
+{<br>
+ unsigned offset = 0;<br>
<br>
- BEGIN_BATCH(4);<br>
+ BEGIN_BATCH(6);<br>
OUT_BATCH(_3DSTATE_PUSH_<u></u>CONSTANT_ALLOC_VS << 16 | (2 - 2));<br>
- OUT_BATCH(size);<br>
+ OUT_BATCH(vs_size | offset << GEN7_PUSH_CONSTANT_BUFFER_<u></u>OFFSET_SHIFT);<br>
+ offset += vs_size;<br>
+<br>
+ OUT_BATCH(_3DSTATE_PUSH_<u></u>CONSTANT_ALLOC_GS << 16 | (2 - 2));<br>
+ OUT_BATCH(gs_size | offset << GEN7_PUSH_CONSTANT_BUFFER_<u></u>OFFSET_SHIFT);<br>
+ offset += gs_size;<br>
<br>
OUT_BATCH(_3DSTATE_PUSH_<u></u>CONSTANT_ALLOC_PS << 16 | (2 - 2));<br>
- OUT_BATCH(size | size << GEN7_PUSH_CONSTANT_BUFFER_<u></u>OFFSET_SHIFT);<br>
+ OUT_BATCH(offset | fs_size << GEN7_PUSH_CONSTANT_BUFFER_<u></u>OFFSET_SHIFT);<br>
ADVANCE_BATCH();<br>
+<br>
+ /* From p292 of the Ivy Bridge PRM (11.2.4 3DSTATE_PUSH_CONSTANT_ALLOC_<u></u>PS):<br>
+ *<br>
+ * A PIPE_CONTOL command with the CS Stall bit set must be programmed<br>
+ * in the ring after this instruction.<br>
+ *<br>
+ * No such restriction exists for Haswell.<br>
+ */<br>
+ if (!brw->is_haswell) {<br>
+ BEGIN_BATCH(4);<br>
+ OUT_BATCH(_3DSTATE_PIPE_<u></u>CONTROL | (4 - 2));<br>
+ /* From p61 of the Ivy Bridge PRM (1.10.4 PIPE_CONTROL Command: DW1[20]<br>
+ * CS Stall):<br>
+ *<br>
+ * One of the following must also be set:<br>
+ * - Render Target Cache Flush Enable ([12] of DW1)<br>
+ * - Depth Cache Flush Enable ([0] of DW1)<br>
+ * - Stall at Pixel Scoreboard ([1] of DW1)<br>
+ * - Depth Stall ([13] of DW1)<br>
+ * - Post-Sync Operation ([13] of DW1)<br>
+ *<br>
+ * We choose to do a Post-Sync Operation (Write Immediate Data), since<br>
+ * it seems like it will incur the least additional performance penalty.<br>
+ */<br>
+ OUT_BATCH(PIPE_CONTROL_CS_<u></u>STALL | PIPE_CONTROL_WRITE_IMMEDIATE);<br>
+ OUT_RELOC(brw->batch.<u></u>workaround_bo,<br>
+ I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, 0);<br>
+ OUT_BATCH(0);<br>
+ ADVANCE_BATCH();<br>
+ }<br>
</blockquote>
<br></div></div>
This is really a bug fix, since it's implementing a hardware workaround on a packet we've always been emitting. I'd really like to see it in a separate patch, for easier bisectability.<br></blockquote><div><br>
</div><div>Sure, no problem. It turns out this was really easy to split out to its own patch.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Interestingly, this text doesn't appear in the latest documentation, and I don't see it in the workaround database. It is definitely there in the PRM, though. I'm not sure what to make of it.</blockquote><div>
<br></div><div>I see it in the current bspec, at the top of 3DSTATE_PUSH_CONSTANT_ALLOC_PS, in the "Description" section.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
}<br>
<br>
+const struct brw_tracked_state gen7_push_constant_space = {<br>
+ .dirty = {<br>
+ .mesa = 0,<br>
+ .brw = BRW_NEW_CONTEXT | BRW_NEW_GEOMETRY_PROGRAM,<br>
</blockquote>
<br></div>
It's worth noting that this will cause a full pipeline stall whenever the current geometry shader changes (since 3DSTATE_PUSH_CONSTANT_ALLOC_* are non-pipelined). We could avoid this when switching between two different geometry shaders. But I suspect it's premature to care about that. Not sure it'll matter much anyway.<div class="im">
<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ .cache = 0,<br>
+ },<br>
+ .emit = gen7_allocate_push_constants,<br>
+};<br>
+<br>
static void<br>
gen7_upload_urb(struct brw_context *brw)<br>
{<br>
<br>
</blockquote>
<br></div>
Other than pulling the PIPE_CONTROL into a separate patch, this looks fine to me.<br>
</blockquote></div><br></div></div>