[Mesa-dev] [PATCH v4 05/12] i965: Put CS local thread ID uniform in last push register
Jason Ekstrand
jason at jlekstrand.net
Wed Jun 1 22:30:26 UTC 2016
On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen <jordan.l.justen at intel.com>
wrote:
> This thread ID uniform will be used to compute the
> gl_LocalInvocationIndex and gl_LocalInvocationID values.
>
> It is important for this uniform to be added in the last push constant
> register. fs_visitor::assign_constant_locations is updated to make
> sure this happens.
>
> The reason this is important is that the cross-thread push constant
> registers are loaded first, and the per-thread push constant registers
> are loaded after that. (Broadwell adds another push constant upload
> mechanism which reverses this order, but we are ignoring this for
> now.)
>
> v2:
> * Add variable in intrinsics lowering pass
> * Make sure the ID is pushed last in assign_constant_locations, and
> that we save a spot for the ID in the push constants
>
> v3:
> * Simplify code based with Jason's suggestions.
>
> Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> ---
> src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index e8a3aab..bb1bf7a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2097,6 +2097,10 @@ fs_visitor::assign_constant_locations()
> bool contiguous[uniforms];
> memset(contiguous, 0, sizeof(contiguous));
>
> + int thread_local_id_index =
> + (stage == MESA_SHADER_COMPUTE) ?
> + ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index : -1;
> +
> /* First, we walk through the instructions and do two things:
> *
> * 1) Figure out which uniforms are live.
> @@ -2141,6 +2145,9 @@ fs_visitor::assign_constant_locations()
> }
> }
>
> + if (thread_local_id_index >= 0 && !is_live[thread_local_id_index])
> + thread_local_id_index = -1;
> +
> /* Only allow 16 registers (128 uniform components) as push constants.
> *
> * Just demote the end of the list. We could probably do better
> @@ -2149,7 +2156,9 @@ fs_visitor::assign_constant_locations()
> * If changing this value, note the limitation about total_regs in
> * brw_curbe.c.
> */
> - const unsigned int max_push_components = 16 * 8;
> + unsigned int max_push_components = 16 * 8;
> + if (thread_local_id_index >= 0)
> + max_push_components--; /* Save a slot for the thread ID */
>
> /* We push small arrays, but no bigger than 16 floats. This is big
> enough
> * for a vec4 but hopefully not large enough to push out other stuff.
> We
> @@ -2187,6 +2196,10 @@ fs_visitor::assign_constant_locations()
> if (!is_live[u] || is_live_64bit[u])
> continue;
>
> + /* Skip thread_local_id_index to put it in the last push register.
> */
> + if (thread_local_id_index == (int)u)
> + continue;
> +
> set_push_pull_constant_loc(u, &chunk_start, contiguous[u],
> push_constant_loc, pull_constant_loc,
> &num_push_constants, &num_pull_constants,
> @@ -2194,6 +2207,10 @@ fs_visitor::assign_constant_locations()
> stage_prog_data);
> }
>
> + /* Add the CS local thread ID uniform at the end of the push constants
> */
> + if (thread_local_id_index >= 0)
> + push_constant_loc[thread_local_id_index] = num_push_constants++;
> +
> /* As the uniforms are going to be reordered, take the data from a
> temporary
> * copy of the original param[].
> */
> @@ -2212,6 +2229,7 @@ fs_visitor::assign_constant_locations()
> * push_constant_loc[i] <= i and we can do it in one smooth loop
> without
> * having to make a copy.
> */
> + int new_thread_local_id_index = -1;
> for (unsigned int i = 0; i < uniforms; i++) {
> const gl_constant_value *value = param[i];
>
> @@ -2219,9 +2237,15 @@ fs_visitor::assign_constant_locations()
> stage_prog_data->pull_param[pull_constant_loc[i]] = value;
> } else if (push_constant_loc[i] != -1) {
> stage_prog_data->param[push_constant_loc[i]] = value;
> + if (thread_local_id_index == (int)i)
> + new_thread_local_id_index = push_constant_loc[i];
>
First off, I think the following is better done as a fix-up patch if we do
it at all :-)
If we make this
if ((int)i == thread_local_id_index) {
assert(stage == MESA_SHADER_COMPUTE)
((brw_cs_prog_data *)stage_prog_data)->thread_local_id_index =
push_constant_loc[i];
continue;
}
at the top of the loop then may be able to avoid having a "param" entry for
the local id. This would mean we could get rid of the extra code where we
set up param and nr_param. Just a thought.
> }
> }
> ralloc_free(param);
> +
> + if (stage == MESA_SHADER_COMPUTE)
> + ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index =
> + new_thread_local_id_index;
> }
>
> /**
> --
> 2.8.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160601/ab3e0c05/attachment.html>
More information about the mesa-dev
mailing list