<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen <span dir="ltr"><<a href="mailto:jordan.l.justen@intel.com" target="_blank">jordan.l.justen@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This thread ID uniform will be used to compute the<br>
gl_LocalInvocationIndex and gl_LocalInvocationID values.<br>
<br>
It is important for this uniform to be added in the last push constant<br>
register. fs_visitor::assign_constant_locations is updated to make<br>
sure this happens.<br>
<br>
The reason this is important is that the cross-thread push constant<br>
registers are loaded first, and the per-thread push constant registers<br>
are loaded after that. (Broadwell adds another push constant upload<br>
mechanism which reverses this order, but we are ignoring this for<br>
now.)<br>
<br>
v2:<br>
* Add variable in intrinsics lowering pass<br>
* Make sure the ID is pushed last in assign_constant_locations, and<br>
that we save a spot for the ID in the push constants<br>
<br>
v3:<br>
* Simplify code based with Jason's suggestions.<br>
<br>
Signed-off-by: Jordan Justen <<a href="mailto:jordan.l.justen@intel.com">jordan.l.justen@intel.com</a>><br>
---<br>
src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +++++++++++++++++++++++++-<br>
1 file changed, 25 insertions(+), 1 deletion(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
index e8a3aab..bb1bf7a 100644<br>
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
@@ -2097,6 +2097,10 @@ fs_visitor::assign_constant_locations()<br>
bool contiguous[uniforms];<br>
memset(contiguous, 0, sizeof(contiguous));<br>
<br>
+ int thread_local_id_index =<br>
+ (stage == MESA_SHADER_COMPUTE) ?<br>
+ ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index : -1;<br>
+<br>
/* First, we walk through the instructions and do two things:<br>
*<br>
* 1) Figure out which uniforms are live.<br>
@@ -2141,6 +2145,9 @@ fs_visitor::assign_constant_locations()<br>
}<br>
}<br>
<br>
+ if (thread_local_id_index >= 0 && !is_live[thread_local_id_index])<br>
+ thread_local_id_index = -1;<br>
+<br>
/* Only allow 16 registers (128 uniform components) as push constants.<br>
*<br>
* Just demote the end of the list. We could probably do better<br>
@@ -2149,7 +2156,9 @@ fs_visitor::assign_constant_locations()<br>
* If changing this value, note the limitation about total_regs in<br>
* brw_curbe.c.<br>
*/<br>
- const unsigned int max_push_components = 16 * 8;<br>
+ unsigned int max_push_components = 16 * 8;<br>
+ if (thread_local_id_index >= 0)<br>
+ max_push_components--; /* Save a slot for the thread ID */<br>
<br>
/* We push small arrays, but no bigger than 16 floats. This is big enough<br>
* for a vec4 but hopefully not large enough to push out other stuff. We<br>
@@ -2187,6 +2196,10 @@ fs_visitor::assign_constant_locations()<br>
if (!is_live[u] || is_live_64bit[u])<br>
continue;<br>
<br>
+ /* Skip thread_local_id_index to put it in the last push register. */<br>
+ if (thread_local_id_index == (int)u)<br>
+ continue;<br>
+<br>
set_push_pull_constant_loc(u, &chunk_start, contiguous[u],<br>
push_constant_loc, pull_constant_loc,<br>
&num_push_constants, &num_pull_constants,<br>
@@ -2194,6 +2207,10 @@ fs_visitor::assign_constant_locations()<br>
stage_prog_data);<br>
}<br>
<br>
+ /* Add the CS local thread ID uniform at the end of the push constants */<br>
+ if (thread_local_id_index >= 0)<br>
+ push_constant_loc[thread_local_id_index] = num_push_constants++;<br>
+<br>
/* As the uniforms are going to be reordered, take the data from a temporary<br>
* copy of the original param[].<br>
*/<br>
@@ -2212,6 +2229,7 @@ fs_visitor::assign_constant_locations()<br>
* push_constant_loc[i] <= i and we can do it in one smooth loop without<br>
* having to make a copy.<br>
*/<br>
+ int new_thread_local_id_index = -1;<br>
for (unsigned int i = 0; i < uniforms; i++) {<br>
const gl_constant_value *value = param[i];<br>
<br>
@@ -2219,9 +2237,15 @@ fs_visitor::assign_constant_locations()<br>
stage_prog_data->pull_param[pull_constant_loc[i]] = value;<br>
} else if (push_constant_loc[i] != -1) {<br>
stage_prog_data->param[push_constant_loc[i]] = value;<br>
+ if (thread_local_id_index == (int)i)<br>
+ new_thread_local_id_index = push_constant_loc[i];<br></blockquote><div><br></div><div>First off, I think the following is better done as a fix-up patch if we do it at all :-)<br></div><div><br></div><div>If we make this<br><br></div><div>if ((int)i == thread_local_id_index) {<br></div><div> assert(stage == MESA_SHADER_COMPUTE)<br></div><div> ((brw_cs_prog_data *)stage_prog_data)->thread_local_id_index = push_constant_loc[i];<br></div><div> continue;<br>}<br></div><br></div><div class="gmail_quote">at the top of the loop then may be able to avoid having a "param" entry for the local id. This would mean we could get rid of the extra code where we set up param and nr_param. Just a thought.<br></div><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
}<br>
}<br>
ralloc_free(param);<br>
+<br>
+ if (stage == MESA_SHADER_COMPUTE)<br>
+ ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index =<br>
+ new_thread_local_id_index;<br>
}<br>
<br>
/**<br>
<span class="HOEnZb"><font color="#888888">--<br>
2.8.1<br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div></div>