<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Oct 30, 2017 at 12:33 AM, Iago Toral <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><span class=""><div>On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote:</div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote type="cite"><span>On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:<br>
> Previously, brw_nir_lower_intrinsics added the param and then emitted<br>
> a<br>
> load_uniform intrinsic to load it directly. This commit switches<br>
> things<br>
> over to use a specific NIR intrinsic for the thread id. The one<br>
> thing I<br>
> don't like about this approach is that we have to copy<br>
> thread_local_id<br>
> over to the new visitor in import_uniforms.<br>
<br>
</span>It is not clear to me why you are doing this... why do you like this<br>
better?<br><div class="m_-4817795264743874480HOEnZb"><div class="m_-4817795264743874480h5"></div></div><br></blockquote><div><br></div><div>For compute shaders, the SPIR-V subgroups stuff has a gl_subgroupId system value which subgroup in the dispatch you are. That information is basically the same as the thread_local_id only off by a factor of the SIMD size. It's fairly arbitrary, but I figured we might as well switch over to pushing the value that's defined in SPIR-V.<br></div></div></div></div></blockquote><div><br></div></span><div>Oh, my question was not about pushing the subgroup id instead of the thread local id (that is actually done in a later patch, not here) it is about using a system value and changing the place where we push that last uniform, which is what you change here. The implementation seems exactly equivalent to what we had prior to this patch, so I was wondering if there is any practical advantage in doing it like this.</div></div></blockquote><div><br></div><div>Not really. It just seemed like, if we have a nir_load_* system value intrinsic, we may as well treat it as a system value like everything else. Assuming it doesn't cause too much pain, I think I'd be ok with dropping this if you really want.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><span class="HOEnZb"><font color="#888888"><div>Iago</div></font></span><div><div class="h5"><div><br></div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote type="cite"><div class="m_-4817795264743874480HOEnZb"><div class="m_-4817795264743874480h5">
> ---<br>
> src/compiler/nir/nir_intrinsi<wbr>cs.h | 3 ++<br>
> src/intel/compiler/brw_fs.cpp<wbr> | 4 +-<br>
> src/intel/compiler/brw_fs.h <wbr> | 1 +<br>
> src/intel/compiler/brw_fs_nir<wbr>.cpp | 14 +++++++<br>
> src/intel/compiler/brw_nir.h <wbr> | 3 +-<br>
> src/intel/compiler/brw_nir_lo<wbr>wer_cs_intrinsics.c | 53 +++++---------<br>
> ----------<br>
> 6 files changed, 32 insertions(+), 46 deletions(-)<br>
><br>
> diff --git a/src/compiler/nir/nir_intrins<wbr>ics.h<br>
> b/src/compiler/nir/nir_intrins<wbr>ics.h<br>
> index cefd18b..47022dd 100644<br>
> --- a/src/compiler/nir/nir_intrins<wbr>ics.h<br>
> +++ b/src/compiler/nir/nir_intrins<wbr>ics.h<br>
> @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color<wbr>_a_float, 1, 0, xx,<br>
> xx, xx)<br>
> SYSTEM_VALUE(blend_const_colo<wbr>r_rgba8888_unorm, 1, 0, xx, xx, xx)<br>
> SYSTEM_VALUE(blend_const_colo<wbr>r_aaaa8888_unorm, 1, 0, xx, xx, xx)<br>
> <br>
> +/* Intel specific system values */<br>
> +SYSTEM_VALUE(intel_thread_loc<wbr>al_id, 1, 0, xx, xx, xx)<br>
> +<br>
> /**<br>
> * Barycentric coordinate intrinsics.<br>
> *<br>
> diff --git a/src/intel/compiler/brw_fs.cp<wbr>p<br>
> b/src/intel/compiler/brw_fs.cp<wbr>p<br>
> index 2acd838..c0d4c05 100644<br>
> --- a/src/intel/compiler/brw_fs.cp<wbr>p<br>
> +++ b/src/intel/compiler/brw_fs.cp<wbr>p<br>
> @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs<wbr>_visitor *v)<br>
> this->push_constant_loc = v->push_constant_loc;<br>
> this->pull_constant_loc = v->pull_constant_loc;<br>
> this->uniforms = v->uniforms;<br>
> + this->thread_local_id = v->thread_local_id;<br>
> }<br>
> <br>
> void<br>
> @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler<br>
> *compiler, void *log_data,<br>
> {<br>
> nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);<br>
> shader = brw_nir_apply_sampler_key(shad<wbr>er, compiler, &key->tex,<br>
> true);<br>
> -<br>
> - brw_nir_lower_cs_intrinsic<wbr>s(shader, prog_data);<br>
> + brw_nir_lower_cs_intrinsic<wbr>s(shader);<br>
> shader = brw_postprocess_nir(shader, compiler, true);<br>
> <br>
> prog_data->local_size[0] = shader->info.cs.local_size[0];<br>
> diff --git a/src/intel/compiler/brw_fs.h<br>
> b/src/intel/compiler/brw_fs.h<br>
> index da32593..f51a4d8 100644<br>
> --- a/src/intel/compiler/brw_fs.h<br>
> +++ b/src/intel/compiler/brw_fs.h<br>
> @@ -315,6 +315,7 @@ public:<br>
> */<br>
> int *push_constant_loc;<br>
> <br>
> + fs_reg thread_local_id;<br>
> fs_reg frag_depth;<br>
> fs_reg frag_stencil;<br>
> fs_reg sample_mask;<br>
> diff --git a/src/intel/compiler/brw_fs_ni<wbr>r.cpp<br>
> b/src/intel/compiler/brw_fs_ni<wbr>r.cpp<br>
> index 05efee3..fdc6fc6 100644<br>
> --- a/src/intel/compiler/brw_fs_ni<wbr>r.cpp<br>
> +++ b/src/intel/compiler/brw_fs_ni<wbr>r.cpp<br>
> @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms<wbr>()<br>
> }<br>
> <br>
> uniforms = nir->num_uniforms / 4;<br>
> +<br>
> + if (stage == MESA_SHADER_COMPUTE) {<br>
> + /* Add a uniform for the thread local id. It must be the last<br>
> uniform<br>
> + * on the list.<br>
> + */<br>
> + assert(uniforms == prog_data->nr_params);<br>
> + uint32_t *param = brw_stage_prog_data_add_params<wbr>(prog_data,<br>
> 1);<br>
> + *param = BRW_PARAM_BUILTIN_THREAD_LOCAL<wbr>_ID;<br>
> + thread_local_id = fs_reg(UNIFORM, uniforms++,<br>
> BRW_REGISTER_TYPE_UD);<br>
> + }<br>
> }<br>
> <br>
> static bool<br>
> @@ -3409,6 +3419,10 @@ fs_visitor::nir_emit_cs_intrin<wbr>sic(const<br>
> fs_builder &bld,<br>
> cs_prog_data->uses_barr<wbr>ier = true;<br>
> break;<br>
> <br>
> + case nir_intrinsic_load_intel_threa<wbr>d_local_id:<br>
> + bld.MOV(retype(dest, BRW_REGISTER_TYPE_UD), thread_local_id);<br>
> + break;<br>
> +<br>
> case nir_intrinsic_load_local_invoc<wbr>ation_id:<br>
> case nir_intrinsic_load_work_group_<wbr>id: {<br>
> gl_system_value sv = nir_system_value_from_intrinsi<wbr>c(instr-<br>
> >intrinsic);<br>
> diff --git a/src/intel/compiler/brw_nir.h<br>
> b/src/intel/compiler/brw_nir.h<br>
> index 1493b74..3e40712 100644<br>
> --- a/src/intel/compiler/brw_nir.h<br>
> +++ b/src/intel/compiler/brw_nir.h<br>
> @@ -95,8 +95,7 @@ void brw_nir_analyze_boolean_resolv<wbr>es(nir_shader<br>
> *nir);<br>
> nir_shader *brw_preprocess_nir(const struct brw_compiler *compiler,<br>
> <wbr> nir_shader *nir);<br>
> <br>
> -bool brw_nir_lower_cs_intrinsics(ni<wbr>r_shader *nir,<br>
> - <wbr> struct brw_cs_prog_data<br>
> *prog_data);<br>
> +bool brw_nir_lower_cs_intrinsics(ni<wbr>r_shader *nir);<br>
> void brw_nir_lower_vs_inputs(nir_sh<wbr>ader *nir,<br>
> <wbr>bool use_legacy_snorm_formula,<br>
> <wbr>const uint8_t *vs_attrib_wa_flags);<br>
> diff --git a/src/intel/compiler/brw_nir_l<wbr>ower_cs_intrinsics.c<br>
> b/src/intel/compiler/brw_nir_l<wbr>ower_cs_intrinsics.c<br>
> index d277276..07d2dcc 100644<br>
> --- a/src/intel/compiler/brw_nir_l<wbr>ower_cs_intrinsics.c<br>
> +++ b/src/intel/compiler/brw_nir_l<wbr>ower_cs_intrinsics.c<br>
> @@ -26,47 +26,12 @@<br>
> <br>
> struct lower_intrinsics_state {<br>
> nir_shader *nir;<br>
> - struct brw_cs_prog_data *prog_data;<br>
> nir_function_impl *impl;<br>
> bool progress;<br>
> nir_builder builder;<br>
> - int thread_local_id_index;<br>
> + unsigned local_workgroup_size;<br>
> };<br>
> <br>
> -static nir_ssa_def *<br>
> -read_thread_local_id(struct lower_intrinsics_state *state)<br>
> -{<br>
> - struct brw_cs_prog_data *prog_data = state->prog_data;<br>
> - nir_builder *b = &state->builder;<br>
> - nir_shader *nir = state->nir;<br>
> - const unsigned *sizes = nir->info.cs.local_size;<br>
> - const unsigned group_size = sizes[0] * sizes[1] * sizes[2];<br>
> -<br>
> - /* Some programs have local_size dimensions so small that the<br>
> thread local<br>
> - * ID will always be 0.<br>
> - */<br>
> - if (group_size <= 8)<br>
> - return nir_imm_int(b, 0);<br>
> -<br>
> - if (state->thread_local_id_index == -1) {<br>
> - state->thread_local_id_<wbr>index = prog_data->base.nr_params;<br>
> - uint32_t *param = brw_stage_prog_data_add_params<wbr>(&prog_data-<br>
> >base, 1);<br>
> - *param = BRW_PARAM_BUILTIN_THREAD_LOCAL<wbr>_ID;<br>
> - nir->num_uniforms += 4;<br>
> - }<br>
> - unsigned id_index = state->thread_local_id_index;<br>
> -<br>
> - nir_intrinsic_instr *load =<br>
> - nir_intrinsic_instr_cre<wbr>ate(nir, nir_intrinsic_load_uniform);<br>
> - load->num_components = 1;<br>
> - load->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));<br>
> - nir_ssa_dest_init(&load->i<wbr>nstr, &load->dest, 1, 32, NULL);<br>
> - nir_intrinsic_set_base(loa<wbr>d, id_index * sizeof(uint32_t));<br>
> - nir_intrinsic_set_range(lo<wbr>ad, sizeof(uint32_t));<br>
> - nir_builder_instr_insert(b<wbr>, &load->instr);<br>
> - return &load->dest.ssa;<br>
> -}<br>
> -<br>
> static bool<br>
> lower_cs_intrinsics_convert_b<wbr>lock(struct lower_intrinsics_state<br>
> *state,<br>
> <wbr> nir_block *block)<br>
> @@ -91,7 +56,12 @@ lower_cs_intrinsics_convert_bl<wbr>ock(struct<br>
> lower_intrinsics_state *state,<br>
> * gl_LocalInvoca<wbr>tionIndex =<br>
> * cs_thread_l<wbr>ocal_id + subgroup_invocation;<br>
> */<br>
> - nir_ssa_def *thread_local_id = read_thread_local_id(state);<br>
> + nir_ssa_def *thread_local_id;<br>
> + if (state->local_workgroup_size <= 8)<br>
> + thread_local_id = nir_imm_int(b, 0);<br>
> + else<br>
> + thread_local_id = nir_load_intel_thread_local_id<wbr>(b);<br>
> +<br>
> nir_ssa_def *channel = nir_load_subgroup_invocation(b<wbr>);<br>
> sysval = nir_iadd(b, channel, thread_local_id);<br>
> break;<br>
> @@ -157,8 +127,7 @@ lower_cs_intrinsics_convert_im<wbr>pl(struct<br>
> lower_intrinsics_state *state)<br>
> }<br>
> <br>
> bool<br>
> -brw_nir_lower_cs_intrinsics(n<wbr>ir_shader *nir,<br>
> - s<wbr>truct brw_cs_prog_data *prog_data)<br>
> +brw_nir_lower_cs_intrinsics(n<wbr>ir_shader *nir)<br>
> {<br>
> assert(nir->info.stage == MESA_SHADER_COMPUTE);<br>
> <br>
> @@ -166,9 +135,9 @@ brw_nir_lower_cs_intrinsics(ni<wbr>r_shader *nir,<br>
> struct lower_intrinsics_state state;<br>
> memset(&state, 0, sizeof(state));<br>
> state.nir = nir;<br>
> - state.prog_data = prog_data;<br>
> -<br>
> - state.thread_local_id_inde<wbr>x = -1;<br>
> + state.local_workgroup_size = nir->info.cs.local_size[0] *<br>
> + <wbr> nir->info.cs.local_size[1] *<br>
> + <wbr> nir->info.cs.local_size[2];<br>
> <br>
> do {<br>
> state.progress = false;<br>
</div></div><br></blockquote></div><br></div></div>
</blockquote></div></div></div></blockquote></div><br></div></div>