[Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled
Iago Toral
itoral at igalia.com
Tue Oct 31 07:03:27 UTC 2017
On Mon, 2017-10-30 at 11:34 -0700, Jason Ekstrand wrote:
> On Mon, Oct 30, 2017 at 12:33 AM, Iago Toral <itoral at igalia.com>
> wrote:
> > On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote:
> > > On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral <itoral at igalia.com>
> > > wrote:
> > > > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> > > >
> > > > > Previously, brw_nir_lower_intrinsics added the param and then
> > > > emitted
> > > >
> > > > > a
> > > >
> > > > > load_uniform intrinsic to load it directly. This commit
> > > > switches
> > > >
> > > > > things
> > > >
> > > > > over to use a specific NIR intrinsic for the thread id. The
> > > > one
> > > >
> > > > > thing I
> > > >
> > > > > don't like about this approach is that we have to copy
> > > >
> > > > > thread_local_id
> > > >
> > > > > over to the new visitor in import_uniforms.
> > > >
> > > >
> > > >
> > > > It is not clear to me why you are doing this... why do you like
> > > > this
> > > >
> > > > better?
> > > >
> > >
> > > For compute shaders, the SPIR-V subgroups stuff has a
> > > gl_subgroupId system value which subgroup in the dispatch you
> > > are. That information is basically the same as the
> > > thread_local_id only off by a factor of the SIMD size. It's
> > > fairly arbitrary, but I figured we might as well switch over to
> > > pushing the value that's defined in SPIR-V.
> >
> > Oh, my question was not about pushing the subgroup id instead of
> > the thread local id (that is actually done in a later patch, not
> > here) it is about using a system value and changing the place where
> > we push that last uniform, which is what you change here. The
> > implementation seems exactly equivalent to what we had prior to
> > this patch, so I was wondering if there is any practical advantage
> > in doing it like this.
>
> Not really. It just seemed like, if we have a nir_load_* system
> value intrinsic, we may as well treat it as a system value like
> everything else. Assuming it doesn't cause too much pain, I think
> I'd be ok with dropping this if you really want.
Not at all, I was just curious if there was another reason for this
that I was missing. I am fine with keeping this.
Iago
> > Iago
> > > > > ---
> > > >
> > > > > src/compiler/nir/nir_intrinsics.h | 3 ++
> > > >
> > > > > src/intel/compiler/brw_fs.cpp | 4 +-
> > > >
> > > > > src/intel/compiler/brw_fs.h | 1 +
> > > >
> > > > > src/intel/compiler/brw_fs_nir.cpp | 14
> > > > +++++++
> > > >
> > > > > src/intel/compiler/brw_nir.h | 3 +-
> > > >
> > > > > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +++++-
> > > > --------
> > > >
> > > > > ----------
> > > >
> > > > > 6 files changed, 32 insertions(+), 46 deletions(-)
> > > >
> > > > >
> > > >
> > > > > diff --git a/src/compiler/nir/nir_intrinsics.h
> > > >
> > > > > b/src/compiler/nir/nir_intrinsics.h
> > > >
> > > > > index cefd18b..47022dd 100644
> > > >
> > > > > --- a/src/compiler/nir/nir_intrinsics.h
> > > >
> > > > > +++ b/src/compiler/nir/nir_intrinsics.h
> > > >
> > > > > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float,
> > > > 1, 0, xx,
> > > >
> > > > > xx, xx)
> > > >
> > > > > SYSTEM_VALUE(blend_const_color_rgba8888_unorm, 1, 0, xx, xx,
> > > > xx)
> > > >
> > > > > SYSTEM_VALUE(blend_const_color_aaaa8888_unorm, 1, 0, xx, xx,
> > > > xx)
> > > >
> > > > >
> > > >
> > > > > +/* Intel specific system values */
> > > >
> > > > > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> > > >
> > > > > +
> > > >
> > > > > /**
> > > >
> > > > > * Barycentric coordinate intrinsics.
> > > >
> > > > > *
> > > >
> > > > > diff --git a/src/intel/compiler/brw_fs.cpp
> > > >
> > > > > b/src/intel/compiler/brw_fs.cpp
> > > >
> > > > > index 2acd838..c0d4c05 100644
> > > >
> > > > > --- a/src/intel/compiler/brw_fs.cpp
> > > >
> > > > > +++ b/src/intel/compiler/brw_fs.cpp
> > > >
> > > > > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor
> > > > *v)
> > > >
> > > > > this->push_constant_loc = v->push_constant_loc;
> > > >
> > > > > this->pull_constant_loc = v->pull_constant_loc;
> > > >
> > > > > this->uniforms = v->uniforms;
> > > >
> > > > > + this->thread_local_id = v->thread_local_id;
> > > >
> > > > > }
> > > >
> > > > >
> > > >
> > > > > void
> > > >
> > > > > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct
> > > > brw_compiler
> > > >
> > > > > *compiler, void *log_data,
> > > >
> > > > > {
> > > >
> > > > > nir_shader *shader = nir_shader_clone(mem_ctx,
> > > > src_shader);
> > > >
> > > > > shader = brw_nir_apply_sampler_key(shader, compiler,
> > > > &key->tex,
> > > >
> > > > > true);
> > > >
> > > > > -
> > > >
> > > > > - brw_nir_lower_cs_intrinsics(shader, prog_data);
> > > >
> > > > > + brw_nir_lower_cs_intrinsics(shader);
> > > >
> > > > > shader = brw_postprocess_nir(shader, compiler, true);
> > > >
> > > > >
> > > >
> > > > > prog_data->local_size[0] = shader->info.cs.local_size[0];
> > > >
> > > > > diff --git a/src/intel/compiler/brw_fs.h
> > > >
> > > > > b/src/intel/compiler/brw_fs.h
> > > >
> > > > > index da32593..f51a4d8 100644
> > > >
> > > > > --- a/src/intel/compiler/brw_fs.h
> > > >
> > > > > +++ b/src/intel/compiler/brw_fs.h
> > > >
> > > > > @@ -315,6 +315,7 @@ public:
> > > >
> > > > > */
> > > >
> > > > > int *push_constant_loc;
> > > >
> > > > >
> > > >
> > > > > + fs_reg thread_local_id;
> > > >
> > > > > fs_reg frag_depth;
> > > >
> > > > > fs_reg frag_stencil;
> > > >
> > > > > fs_reg sample_mask;
> > > >
> > > > > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > > >
> > > > > b/src/intel/compiler/brw_fs_nir.cpp
> > > >
> > > > > index 05efee3..fdc6fc6 100644
> > > >
> > > > > --- a/src/intel/compiler/brw_fs_nir.cpp
> > > >
> > > > > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > > >
> > > > > @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms()
> > > >
> > > > > }
> > > >
> > > > >
> > > >
> > > > > uniforms = nir->num_uniforms / 4;
> > > >
> > > > > +
> > > >
> > > > > + if (stage == MESA_SHADER_COMPUTE) {
> > > >
> > > > > + /* Add a uniform for the thread local id. It must be
> > > > the last
> > > >
> > > > > uniform
> > > >
> > > > > + * on the list.
> > > >
> > > > > + */
> > > >
> > > > > + assert(uniforms == prog_data->nr_params);
> > > >
> > > > > + uint32_t *param =
> > > > brw_stage_prog_data_add_params(prog_data,
> > > >
> > > > > 1);
> > > >
> > > > > + *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID;
> > > >
> > > > > + thread_local_id = fs_reg(UNIFORM, uniforms++,
> > > >
> > > > > BRW_REGISTER_TYPE_UD);
> > > >
> > > > > + }
> > > >
> > > > > }
> > > >
> > > > >
> > > >
> > > > > static bool
> > > >
> > > > > @@ -3409,6 +3419,10 @@
> > > > fs_visitor::nir_emit_cs_intrinsic(const
> > > >
> > > > > fs_builder &bld,
> > > >
> > > > > cs_prog_data->uses_barrier = true;
> > > >
> > > > > break;
> > > >
> > > > >
> > > >
> > > > > + case nir_intrinsic_load_intel_thread_local_id:
> > > >
> > > > > + bld.MOV(retype(dest, BRW_REGISTER_TYPE_UD),
> > > > thread_local_id);
> > > >
> > > > > + break;
> > > >
> > > > > +
> > > >
> > > > > case nir_intrinsic_load_local_invocation_id:
> > > >
> > > > > case nir_intrinsic_load_work_group_id: {
> > > >
> > > > > gl_system_value sv =
> > > > nir_system_value_from_intrinsic(instr-
> > > >
> > > > > >intrinsic);
> > > >
> > > > > diff --git a/src/intel/compiler/brw_nir.h
> > > >
> > > > > b/src/intel/compiler/brw_nir.h
> > > >
> > > > > index 1493b74..3e40712 100644
> > > >
> > > > > --- a/src/intel/compiler/brw_nir.h
> > > >
> > > > > +++ b/src/intel/compiler/brw_nir.h
> > > >
> > > > > @@ -95,8 +95,7 @@ void
> > > > brw_nir_analyze_boolean_resolves(nir_shader
> > > >
> > > > > *nir);
> > > >
> > > > > nir_shader *brw_preprocess_nir(const struct brw_compiler
> > > > *compiler,
> > > >
> > > > > nir_shader *nir);
> > > >
> > > > >
> > > >
> > > > > -bool brw_nir_lower_cs_intrinsics(nir_shader *nir,
> > > >
> > > > > - struct brw_cs_prog_data
> > > >
> > > > > *prog_data);
> > > >
> > > > > +bool brw_nir_lower_cs_intrinsics(nir_shader *nir);
> > > >
> > > > > void brw_nir_lower_vs_inputs(nir_shader *nir,
> > > >
> > > > > bool use_legacy_snorm_formula,
> > > >
> > > > > const uint8_t
> > > > *vs_attrib_wa_flags);
> > > >
> > > > > diff --git a/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> > > >
> > > > > b/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> > > >
> > > > > index d277276..07d2dcc 100644
> > > >
> > > > > --- a/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> > > >
> > > > > +++ b/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> > > >
> > > > > @@ -26,47 +26,12 @@
> > > >
> > > > >
> > > >
> > > > > struct lower_intrinsics_state {
> > > >
> > > > > nir_shader *nir;
> > > >
> > > > > - struct brw_cs_prog_data *prog_data;
> > > >
> > > > > nir_function_impl *impl;
> > > >
> > > > > bool progress;
> > > >
> > > > > nir_builder builder;
> > > >
> > > > > - int thread_local_id_index;
> > > >
> > > > > + unsigned local_workgroup_size;
> > > >
> > > > > };
> > > >
> > > > >
> > > >
> > > > > -static nir_ssa_def *
> > > >
> > > > > -read_thread_local_id(struct lower_intrinsics_state *state)
> > > >
> > > > > -{
> > > >
> > > > > - struct brw_cs_prog_data *prog_data = state->prog_data;
> > > >
> > > > > - nir_builder *b = &state->builder;
> > > >
> > > > > - nir_shader *nir = state->nir;
> > > >
> > > > > - const unsigned *sizes = nir->info.cs.local_size;
> > > >
> > > > > - const unsigned group_size = sizes[0] * sizes[1] *
> > > > sizes[2];
> > > >
> > > > > -
> > > >
> > > > > - /* Some programs have local_size dimensions so small that
> > > > the
> > > >
> > > > > thread local
> > > >
> > > > > - * ID will always be 0.
> > > >
> > > > > - */
> > > >
> > > > > - if (group_size <= 8)
> > > >
> > > > > - return nir_imm_int(b, 0);
> > > >
> > > > > -
> > > >
> > > > > - if (state->thread_local_id_index == -1) {
> > > >
> > > > > - state->thread_local_id_index = prog_data-
> > > > >base.nr_params;
> > > >
> > > > > - uint32_t *param =
> > > > brw_stage_prog_data_add_params(&prog_data-
> > > >
> > > > > >base, 1);
> > > >
> > > > > - *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID;
> > > >
> > > > > - nir->num_uniforms += 4;
> > > >
> > > > > - }
> > > >
> > > > > - unsigned id_index = state->thread_local_id_index;
> > > >
> > > > > -
> > > >
> > > > > - nir_intrinsic_instr *load =
> > > >
> > > > > - nir_intrinsic_instr_create(nir,
> > > > nir_intrinsic_load_uniform);
> > > >
> > > > > - load->num_components = 1;
> > > >
> > > > > - load->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
> > > >
> > > > > - nir_ssa_dest_init(&load->instr, &load->dest, 1, 32,
> > > > NULL);
> > > >
> > > > > - nir_intrinsic_set_base(load, id_index *
> > > > sizeof(uint32_t));
> > > >
> > > > > - nir_intrinsic_set_range(load, sizeof(uint32_t));
> > > >
> > > > > - nir_builder_instr_insert(b, &load->instr);
> > > >
> > > > > - return &load->dest.ssa;
> > > >
> > > > > -}
> > > >
> > > > > -
> > > >
> > > > > static bool
> > > >
> > > > > lower_cs_intrinsics_convert_block(struct
> > > > lower_intrinsics_state
> > > >
> > > > > *state,
> > > >
> > > > > nir_block *block)
> > > >
> > > > > @@ -91,7 +56,12 @@ lower_cs_intrinsics_convert_block(struct
> > > >
> > > > > lower_intrinsics_state *state,
> > > >
> > > > > * gl_LocalInvocationIndex =
> > > >
> > > > > * cs_thread_local_id + subgroup_invocation;
> > > >
> > > > > */
> > > >
> > > > > - nir_ssa_def *thread_local_id =
> > > > read_thread_local_id(state);
> > > >
> > > > > + nir_ssa_def *thread_local_id;
> > > >
> > > > > + if (state->local_workgroup_size <= 8)
> > > >
> > > > > + thread_local_id = nir_imm_int(b, 0);
> > > >
> > > > > + else
> > > >
> > > > > + thread_local_id =
> > > > nir_load_intel_thread_local_id(b);
> > > >
> > > > > +
> > > >
> > > > > nir_ssa_def *channel =
> > > > nir_load_subgroup_invocation(b);
> > > >
> > > > > sysval = nir_iadd(b, channel, thread_local_id);
> > > >
> > > > > break;
> > > >
> > > > > @@ -157,8 +127,7 @@ lower_cs_intrinsics_convert_impl(struct
> > > >
> > > > > lower_intrinsics_state *state)
> > > >
> > > > > }
> > > >
> > > > >
> > > >
> > > > > bool
> > > >
> > > > > -brw_nir_lower_cs_intrinsics(nir_shader *nir,
> > > >
> > > > > - struct brw_cs_prog_data
> > > > *prog_data)
> > > >
> > > > > +brw_nir_lower_cs_intrinsics(nir_shader *nir)
> > > >
> > > > > {
> > > >
> > > > > assert(nir->info.stage == MESA_SHADER_COMPUTE);
> > > >
> > > > >
> > > >
> > > > > @@ -166,9 +135,9 @@ brw_nir_lower_cs_intrinsics(nir_shader
> > > > *nir,
> > > >
> > > > > struct lower_intrinsics_state state;
> > > >
> > > > > memset(&state, 0, sizeof(state));
> > > >
> > > > > state.nir = nir;
> > > >
> > > > > - state.prog_data = prog_data;
> > > >
> > > > > -
> > > >
> > > > > - state.thread_local_id_index = -1;
> > > >
> > > > > + state.local_workgroup_size = nir->info.cs.local_size[0] *
> > > >
> > > > > + nir->info.cs.local_size[1] *
> > > >
> > > > > + nir->info.cs.local_size[2];
> > > >
> > > > >
> > > >
> > > > > do {
> > > >
> > > > > state.progress = false;
> > > >
> > > >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171031/1ba197d1/attachment-0001.html>
More information about the mesa-dev
mailing list