<p dir="ltr"><br>
On Dec 3, 2015 3:01 PM, "Eric Anholt" <<a href="mailto:eric@anholt.net">eric@anholt.net</a>> wrote:<br>
><br>
> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
><br>
> > There is some special-casing needed in a competent back-end. However, they<br>
> > can do their special-casing easily enough based on whether or not the<br>
> > offset is a constant. In the mean time, having the *_indirect variants<br>
> > adds special cases a number of places where they don't need to be and, in<br>
> > general, only complicates things. To complicate matters, NIR had no way to<br>
> > convdert an indirect load/store to a direct one in the case that the<br>
> > indirect was a constant so we would still not really get what the back-ends<br>
> > wanted. The best solution seems to be to get rid of the *_indirect<br>
> > variants entirely.<br>
><br>
> I've been putting off debugging this series because NIR intrinsic<br>
> documentation has been bad at const_index[] versus src[] explanations<br>
> and I only ever get my code working through cargo culting. It looks<br>
> like you simplify things a lot, but I'm going to ask for some<br>
> clarification still.<br>
><br>
> > ---<br>
> > src/glsl/nir/nir_intrinsics.h | 64 ++++++++++++++++-----------------<br>
> > src/glsl/nir/nir_lower_phis_to_scalar.c | 4 ---<br>
> > src/glsl/nir/nir_print.c | 19 +++++-----<br>
> > 3 files changed, 38 insertions(+), 49 deletions(-)<br>
> ><br>
> > diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h<br>
> > index b2565c5..0fa5a27 100644<br>
> > --- a/src/glsl/nir/nir_intrinsics.h<br>
> > +++ b/src/glsl/nir/nir_intrinsics.h<br>
> > @@ -228,54 +228,50 @@ SYSTEM_VALUE(num_work_groups, 3, 0)<br>
> > SYSTEM_VALUE(helper_invocation, 1, 0)<br>
> ><br>
> > /*<br>
> > - * The format of the indices depends on the type of the load. For uniforms,<br>
> > - * the first index is the base address and the second index is an offset that<br>
> > - * should be added to the base address. (This way you can determine in the<br>
> > - * back-end which variable is being accessed even in an array.) For inputs,<br>
> > - * the one and only index corresponds to the attribute slot. UBO loads also<br>
> > - * have a single index which is the base address to load from.<br>
> > + * All load operations have a source specifying an offset which may or may<br>
> > + * not be constant. If the shader is still in SSA or partial SSA form, then<br>
> > + * determining whether or not the offset is constant is trivial. This is<br>
> > + * always the last source in the intrinsic.<br>
> > *<br>
> > - * UBO loads have a (possibly constant) source which is the UBO buffer index.<br>
> > - * For each type of load, the _indirect variant has one additional source<br>
> > - * (the second in the case of UBO's) that is the is an indirect to be added to<br>
> > - * the constant address or base offset to compute the final offset.<br>
> > + * Uniforms have a constant index that provides a secondary base offset that<br>
> > + * should be added to the offset from the source. This allows back-ends to<br>
> > + * determine which uniform variable is being accessed.<br>
><br>
> For clarity, since we have things that are indices/offsets and might be<br>
> constants but appear in src[]: s/constant index/constant_index/.</p>
<p dir="ltr">Will do.</p>
<p dir="ltr">> > - * For vector backends, the address is in terms of one vec4, and so each array<br>
> > - * element is +4 scalar components from the previous array element. For scalar<br>
> > - * backends, the address is in terms of a single 4-byte float/int and arrays<br>
> > - * elements begin immediately after the previous array element.<br>
> > + * UBO and SSBO loads have a (possibly constant) source which is the UBO<br>
> > + * buffer index. The pervertex_input intrinsic has a source which specifies<br>
> > + * the (possibly constant) vertex id to load from.<br>
> > + *<br>
> > + * The exact address type depends on the lowering pass that generates the<br>
> > + * load/store intrinsics. Typically, this is vec4 units for things such as<br>
> > + * varying slots and float units for fragment shader inputs. UBO and SSBO<br>
> > + * offsets are always in bytes.<br>
> > */<br>
> ><br>
> > #define LOAD(name, extra_srcs, indices, flags) \<br>
> > - INTRINSIC(load_##name, extra_srcs, ARR(1), true, 0, 0, indices, flags) \<br>
> > - INTRINSIC(load_##name##_indirect, extra_srcs + 1, ARR(1, 1), \<br>
> > - true, 0, 0, indices, flags)<br>
> > + INTRINSIC(load_##name, extra_srcs + 1, ARR(1, 1), true, 0, 0, indices, flags)<br>
> ><br>
> > -LOAD(uniform, 0, 2, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
><br>
> Uniforms had two const_index entries before? I've only ever used one.</p>
<p dir="ltr">Yeah, they did for a while. It never affected any tgsi users though so I didn't bother updating those drivers.</p>
<p dir="ltr">> > -LOAD(ubo, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > -LOAD(input, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > -LOAD(per_vertex_input, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > -LOAD(ssbo, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)<br>
> > -LOAD(output, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE)<br>
> > -LOAD(per_vertex_output, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE)<br>
> > +LOAD(uniform, 0, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > +LOAD(ubo, 1, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > +LOAD(input, 0, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > +LOAD(per_vertex_input, 1, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)<br>
> > +LOAD(ssbo, 1, 0, NIR_INTRINSIC_CAN_ELIMINATE)<br>
> > +LOAD(output, 0, 0, NIR_INTRINSIC_CAN_ELIMINATE)<br>
> > +LOAD(per_vertex_output, 1, 0, NIR_INTRINSIC_CAN_ELIMINATE)<br>
><br>
> I think now that we don't have multiple variants being generated by<br>
> LOAD(), baking the extra_srcs + 1 into the LOAD() calls would clarify<br>
> things: uniforms have 1 src and 1 index, ssbos have 2 srcs and 0<br>
> indices, and it's right there in the call.</p>
<p dir="ltr">Right. Good point.</p>
<p dir="ltr">> > /*<br>
> > - * Stores work the same way as loads, except now the first register input is<br>
> > - * the value or array to store and the optional second input is the indirect<br>
> > - * offset. SSBO stores are similar, but they accept an extra source for the<br>
> > - * block index and an extra index with the writemask to use.<br>
> > + * Stores work the same way as loads, except now the first source is the value<br>
> > + * to store and the second input is the indirect offset. SSBO stores are<br>
> > + * similar, but they accept an extra source for the block index and an extra<br>
> > + * index with the writemask to use.<br>
> > */<br>
><br>
> Apparently the per vertex index or SSBO block is slipped between the<br>
> "first" and "second" inputs? That's weird. Suggestion for clarity<br>
> below.<br>
><br>
> > #define STORE(name, extra_srcs, extra_srcs_size, extra_indices, flags) \<br>
> > - INTRINSIC(store_##name, 1 + extra_srcs, \<br>
> > - ARR(0, extra_srcs_size, extra_srcs_size, extra_srcs_size), \<br>
> > - false, 0, 0, 1 + extra_indices, flags) \<br>
> > - INTRINSIC(store_##name##_indirect, 2 + extra_srcs, \<br>
> > + INTRINSIC(store_##name, 2 + extra_srcs, \<br>
> > ARR(0, 1, extra_srcs_size, extra_srcs_size), \<br>
> > - false, 0, 0, 1 + extra_indices, flags)<br>
> > + false, 0, 0, extra_indices, flags)<br>
> ><br>
> > STORE(output, 0, 0, 0, 0)<br>
> > STORE(per_vertex_output, 1, 1, 0, 0)<br>
> > STORE(ssbo, 1, 1, 1, 0)<br>
><br>
> I think instead of the long form comment about the sources, I'd rather<br>
> just have:<br>
><br>
> /* src[] = { value, offset }. No const_index */<br>
> STORE(output, 0, 0, 0, 0) /* src[]: offset, value */<br>
> /* src[] = { value, vertex_index, offset }. No const_index */<br>
> STORE(per_vertex_output, 1, 1, 0, 0)<br>
> /* src[] = { value, block, offset }. const_index[] = { write_mask } */<br>
> STORE(ssbo, 1, 1, 1, 0)</p>
<p dir="ltr">I like that. We should do the same for outputs.</p>