[Mesa-dev] [PATCH v2 22/25] nir/i965: use two slots from inputs_read for dvec3/dvec4 vertex input attributes

Thu Jan 5 14:56:14 UTC 2017

On Thu, 2017-01-05 at 06:41 -0800, Jason Ekstrand wrote:
> On Jan 5, 2017 3:11 AM, "Juan A. Suarez Romero" <jasuarez at igalia.com>
> wrote:
> On Wed, 2017-01-04 at 07:06 -0800, Jason Ekstrand wrote:
> > On Jan 4, 2017 5:46 AM, "Juan A. Suarez Romero" <jasuarez at igalia.co
> > m> wrote:
> > On Tue, 2017-01-03 at 14:41 -0800, Jason Ekstrand wrote:
> > > I made a few pretty trivial comments.  With those addressed,
> > > 
> > > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > > 
> > > On Dec 16, 2016 8:55 AM, "Juan A. Suarez Romero" <jasuarez at igalia
> > > .com> wrote:
> > > So far, input_reads was a bitmap tracking which vertex input
> > > locations
> > > 
> > > were being used.
> > > 
> > > 
> > > 
> > > In OpenGL, an attribute bigger than a vec4 (like a dvec3 or
> > > dvec4)
> > > 
> > > consumes just one location, any other small attribute. So we mark
> > > the
> > > 
> > > proper bit in inputs_read, and also the same bit in
> > > double_inputs_read
> > > 
> > > if the attribute is a dvec3/dvec4.
> > > 
> > > 
> > > 
> > > But in Vulkan, this is slightly different: a dvec3/dvec4
> > > attribute
> > > 
> > > consumes two locations, not just one. And hence two bits would be
> > > marked
> > > 
> > > in inputs_read for the same vertex input attribute.
> > > 
> > > 
> > > 
> > > To avoid handling two different situations in NIR, we just choose
> > > the
> > > 
> > > latest one: in OpenGL, when creating NIR from GLSL/IR, any
> > > dvec3/dvec4
> > > 
> > > vertex input attribute is marked with two bits in the inputs_read
> > > bitmap
> > > 
> > > (and also in the double_inputs_read), and following attributes
> > > are
> > > 
> > > adjusted accordingly.
> > > 
> > > 
> > > 
> > > As example, if in our GLSL/IR shader we have three attributes:
> > > 
> > > 
> > > 
> > > layout(location = 0) vec3  attr0;
> > > 
> > > layout(location = 1) dvec4 attr1;
> > > 
> > > layout(location = 2) dvec3 attr2;
> > > 
> > > 
> > > 
> > > then in our NIR shader we put attr0 in location 0, attr1 in
> > > locations 1
> > > 
> > > and 2, and attr2 in location 3.
> > > 
> > > 
> > > attr2 goes in locations 3 *and* 4, correct?
> > > 
> > > 
> > > Checking carefully, basically we are using slots rather than
> > > locations
> > > 
> > > in NIR.
> > > 
> > > 
> > > 
> > > When emitting the vertices, we do a inverse map to know the
> > > 
> > > corresponding location for each slot.
> > > 
> > > 
> > > 
> > > v2 (Jason):
> > > 
> > > - use two slots from inputs_read for dvec3/dvec4 NIR from
> > > GLSL/IR.
> > > 
> > > ---
> > > 
> > >  src/compiler/glsl/glsl_to_nir.cpp            | 28 +++++++++++++
> > > 
> > >  src/compiler/nir/nir_gather_info.c           | 48 ++++++++++--
> > > ---------
> > > 
> > >  src/intel/vulkan/genX_pipeline.c             | 63
> > > +++++++++++++++++-----------
> > > 
> > >  src/mesa/drivers/dri/i965/brw_draw_upload.c  | 11 +++--
> > > 
> > >  src/mesa/drivers/dri/i965/brw_fs.cpp         | 13 ------
> > > 
> > >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  3 +-
> > > 
> > >  src/mesa/drivers/dri/i965/brw_nir.c          |  6 +--
> > > 
> > >  src/mesa/drivers/dri/i965/brw_nir.h          |  1 -
> > > 
> > >  src/mesa/drivers/dri/i965/brw_vec4.cpp       | 11 +++--
> > > 
> > >  9 files changed, 106 insertions(+), 78 deletions(-)
> > > 
> > > 
> > > 
> > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> > > b/src/compiler/glsl/glsl_to_nir.cpp
> > > 
> > > index 4debc37..0814dad 100644
> > > 
> > > --- a/src/compiler/glsl/glsl_to_nir.cpp
> > > 
> > > +++ b/src/compiler/glsl/glsl_to_nir.cpp
> > > 
> > > @@ -129,6 +129,19 @@ private:
> > > 
> > > 
> > > 
> > >  } /* end of anonymous namespace */
> > > 
> > > 
> > > 
> > > +static void
> > > 
> > > +nir_remap_attributes(nir_shader *shader)
> > > 
> > > +{
> > > 
> > > +   nir_foreach_variable(var, &shader->inputs) {
> > > 
> > > +      var->data.location += _mesa_bitcount_64(shader->info-
> > > >double_inputs_read &
> > > 
> > > +                                             
> > > BITFIELD64_MASK(var->data.location));
> > > 
> > > +   }
> > > 
> > > +
> > > 
> > > +   /* Once the remap is done, reset double_inputs_read, so later
> > > it will have
> > > 
> > > +    * which location/slots are doubles */
> > > 
> > > +   shader->info->double_inputs_read = 0;
> > > 
> > > +}
> > > 
> > > +
> > > 
> > >  nir_shader *
> > > 
> > >  glsl_to_nir(const struct gl_shader_program *shader_prog,
> > > 
> > >              gl_shader_stage stage,
> > > 
> > > @@ -146,6 +159,13 @@ glsl_to_nir(const struct gl_shader_program
> > > *shader_prog,
> > > 
> > > 
> > > 
> > >     nir_lower_constant_initializers(shader,
> > > (nir_variable_mode)~0);
> > > 
> > > 
> > > 
> > > +   /* Remap the locations to slots so those requiring two slots
> > > will occupy
> > > 
> > > +    * two locations. For instance, if we have in the IR code a
> > > dvec3 attr0 in
> > > 
> > > +    * location 0 and vec4 attr1 in location 1, in NIR attr0 will
> > > use
> > > 
> > > +    * locations/slots 0 and 1, and attr1 will use location/slot
> > > 2 */
> > > 
> > > +   if (shader->stage == MESA_SHADER_VERTEX)
> > > 
> > > +      nir_remap_attributes(shader);
> > > 
> > > +
> > > 
> > >     shader->info->name = ralloc_asprintf(shader, "GLSL%d",
> > > shader_prog->Name);
> > > 
> > >     if (shader_prog->Label)
> > > 
> > >        shader->info->label = ralloc_strdup(shader, shader_prog-
> > > >Label);
> > > 
> > > @@ -315,6 +335,14 @@ nir_visitor::visit(ir_variable *ir)
> > > 
> > >        } else {
> > > 
> > >           var->data.mode = nir_var_shader_in;
> > > 
> > >        }
> > > 
> > > +
> > > 
> > > +      /* Mark all the locations that require two slots */
> > > 
> > > +      if (glsl_type_is_dual_slot(glsl_without_array(var->type))) 
> > > {
> > > 
> > > +         for (uint i = 0; i < glsl_count_attribute_slots(var-
> > > >type, true); i++) {
> > > 
> > > +            uint64_t bitfield = BITFIELD64_BIT(var-
> > > >data.location + i);
> > > 
> > > +            shader->info->double_inputs_read |= bitfield;
> > > 
> > > +         }
> > > 
> > > +      }
> > > 
> > >        break;
> > > 
> > > 
> > > 
> > >     case ir_var_shader_out:
> > > 
> > > diff --git a/src/compiler/nir/nir_gather_info.c
> > > b/src/compiler/nir/nir_gather_info.c
> > > 
> > > index 07c9949..35a1ce4 100644
> > > 
> > > --- a/src/compiler/nir/nir_gather_info.c
> > > 
> > > +++ b/src/compiler/nir/nir_gather_info.c
> > > 
> > > @@ -53,11 +53,6 @@ set_io_mask(nir_shader *shader, nir_variable
> > > *var, int offset, int len)
> > > 
> > >           else
> > > 
> > >              shader->info->inputs_read |= bitfield;
> > > 
> > > 
> > > 
> > > -         /* double inputs read is only for vertex inputs */
> > > 
> > > -         if (shader->stage == MESA_SHADER_VERTEX &&
> > > 
> > > -             glsl_type_is_dual_slot(glsl_without_array(var-
> > > >type)))
> > > 
> > > -            shader->info->double_inputs_read |= bitfield;
> > > 
> > > -
> > > 
> > >           if (shader->stage == MESA_SHADER_FRAGMENT) {
> > > 
> > >              shader->info->fs.uses_sample_qualifier |= var-
> > > >data.sample;
> > > 
> > >           }
> > > 
> > > @@ -83,26 +78,21 @@ static void
> > > 
> > >  mark_whole_variable(nir_shader *shader, nir_variable *var)
> > > 
> > >  {
> > > 
> > >     const struct glsl_type *type = var->type;
> > > 
> > > -   bool is_vertex_input = false;
> > > 
> > > 
> > > 
> > >     if (nir_is_per_vertex_io(var, shader->stage)) {
> > > 
> > >        assert(glsl_type_is_array(type));
> > > 
> > >        type = glsl_get_array_element(type);
> > > 
> > >     }
> > > 
> > > 
> > > 
> > > -   if (shader->stage == MESA_SHADER_VERTEX &&
> > > 
> > > -       var->data.mode == nir_var_shader_in)
> > > 
> > > -      is_vertex_input = true;
> > > 
> > > -
> > > 
> > >     const unsigned slots =
> > > 
> > >        var->data.compact ? DIV_ROUND_UP(glsl_get_length(type), 4)
> > > 
> > > -                        : glsl_count_attribute_slots(type,
> > > is_vertex_input);
> > > 
> > > +                        : glsl_count_attribute_slots(type,
> > > false);
> > > 
> > > 
> > > 
> > >     set_io_mask(shader, var, 0, slots);
> > > 
> > >  }
> > > 
> > > 
> > > 
> > >  static unsigned
> > > 
> > > -get_io_offset(nir_deref_var *deref, bool is_vertex_input)
> > > 
> > > +get_io_offset(nir_deref_var *deref)
> > > 
> > >  {
> > > 
> > >     unsigned offset = 0;
> > > 
> > > 
> > > 
> > > @@ -117,7 +107,7 @@ get_io_offset(nir_deref_var *deref, bool
> > > is_vertex_input)
> > > 
> > >              return -1;
> > > 
> > >           }
> > > 
> > > 
> > > 
> > > -         offset += glsl_count_attribute_slots(tail->type,
> > > is_vertex_input) *
> > > 
> > > +         offset += glsl_count_attribute_slots(tail->type, false)
> > > *
> > > 
> > >              deref_array->base_offset;
> > > 
> > >        }
> > > 
> > >        /* TODO: we can get the offset for structs here see
> > > nir_lower_io() */
> > > 
> > > @@ -163,12 +153,7 @@ try_mask_partial_io(nir_shader *shader,
> > > nir_deref_var *deref)
> > > 
> > >        return false;
> > > 
> > >     }
> > > 
> > > 
> > > 
> > > -   bool is_vertex_input = false;
> > > 
> > > -   if (shader->stage == MESA_SHADER_VERTEX &&
> > > 
> > > -       var->data.mode == nir_var_shader_in)
> > > 
> > > -      is_vertex_input = true;
> > > 
> > > -
> > > 
> > > -   unsigned offset = get_io_offset(deref, is_vertex_input);
> > > 
> > > +   unsigned offset = get_io_offset(deref);
> > > 
> > >     if (offset == -1)
> > > 
> > >        return false;
> > > 
> > > 
> > > 
> > > @@ -184,8 +169,7 @@ try_mask_partial_io(nir_shader *shader,
> > > nir_deref_var *deref)
> > > 
> > >     }
> > > 
> > > 
> > > 
> > >     /* double element width for double types that takes two slots
> > > */
> > > 
> > > -   if (!is_vertex_input &&
> > > 
> > > -       glsl_type_is_dual_slot(glsl_without_array(type))) {
> > > 
> > > +   if (glsl_type_is_dual_slot(glsl_without_array(type))) {
> > > 
> > >        elem_width *= 2;
> > > 
> > >     }
> > > 
> > > 
> > > 
> > > @@ -220,13 +204,27 @@ gather_intrinsic_info(nir_intrinsic_instr
> > > *instr, nir_shader *shader)
> > > 
> > >     case nir_intrinsic_interp_var_at_sample:
> > > 
> > >     case nir_intrinsic_interp_var_at_offset:
> > > 
> > >     case nir_intrinsic_load_var:
> > > 
> > > -   case nir_intrinsic_store_var:
> > > 
> > > -      if (instr->variables[0]->var->data.mode ==
> > > nir_var_shader_in ||
> > > 
> > > -          instr->variables[0]->var->data.mode ==
> > > nir_var_shader_out) {
> > > 
> > > +   case nir_intrinsic_store_var: {
> > > 
> > > +      nir_variable *var = instr->variables[0]->var;
> > > 
> > > +
> > > 
> > > +      if (var->data.mode == nir_var_shader_in ||
> > > 
> > > +          var->data.mode == nir_var_shader_out) {
> > > 
> > >           if (!try_mask_partial_io(shader, instr->variables[0]))
> > > 
> > > -            mark_whole_variable(shader, instr->variables[0]-
> > > >var);
> > > 
> > > +            mark_whole_variable(shader, var);
> > > 
> > > +
> > > 
> > > +         /* We need to track which input_reads bits correspond
> > > to a
> > > 
> > > +          * dvec3/dvec4 input attribute */
> > > 
> > > +         if (shader->stage == MESA_SHADER_VERTEX &&
> > > 
> > > +             var->data.mode == nir_var_shader_in &&
> > > 
> > > +             glsl_type_is_dual_slot(glsl_without_array(var-
> > > >type))) {
> > > 
> > > +            for (uint i = 0; i < glsl_count_attribute_slots(var-
> > > >type, false); i++) {
> > > 
> > > +               int idx = var->data.location + i;
> > > 
> > > +               shader->info->double_inputs_read |=
> > > BITFIELD64_BIT(idx);
> > > 
> > > +            }
> > > 
> > > +         }
> > > 
> > >        }
> > > 
> > >        break;
> > > 
> > > +   }
> > > 
> > > 
> > > 
> > >     case nir_intrinsic_load_draw_id:
> > > 
> > >     case nir_intrinsic_load_front_face:
> > > 
> > > diff --git a/src/intel/vulkan/genX_pipeline.c
> > > b/src/intel/vulkan/genX_pipeline.c
> > > 
> > > index 845d020..7b94959 100644
> > > 
> > > --- a/src/intel/vulkan/genX_pipeline.c
> > > 
> > > +++ b/src/intel/vulkan/genX_pipeline.c
> > > 
> > > @@ -33,26 +33,33 @@
> > > 
> > >  static uint32_t
> > > 
> > >  vertex_element_comp_control(enum isl_format format, unsigned
> > > comp)
> > > 
> > >  {
> > > 
> > > -   uint8_t bits;
> > > 
> > >     switch (comp) {
> > > 
> > > -   case 0: bits = isl_format_layouts[format].channels.r.bits;
> > > break;
> > > 
> > > -   case 1: bits = isl_format_layouts[format].channels.g.bits;
> > > break;
> > > 
> > > -   case 2: bits = isl_format_layouts[format].channels.b.bits;
> > > break;
> > > 
> > > -   case 3: bits = isl_format_layouts[format].channels.a.bits;
> > > break;
> > > 
> > > -   default: unreachable("Invalid component");
> > > 
> > > -   }
> > > 
> > > -
> > > 
> > > -   if (bits) {
> > > 
> > > -      return VFCOMP_STORE_SRC;
> > > 
> > > -   } else if (comp < 3) {
> > > 
> > > -      return VFCOMP_STORE_0;
> > > 
> > > -   } else if (isl_format_layouts[format].channels.r.type ==
> > > ISL_UINT ||
> > > 
> > > -            isl_format_layouts[format].channels.r.type ==
> > > ISL_SINT) {
> > > 
> > > -      assert(comp == 3);
> > > 
> > > -      return VFCOMP_STORE_1_INT;
> > > 
> > > -   } else {
> > > 
> > > -      assert(comp == 3);
> > > 
> > > -      return VFCOMP_STORE_1_FP;
> > > 
> > > +   case 0:
> > > 
> > > +      return isl_format_layouts[format].channels.r.bits ?
> > > 
> > > +         VFCOMP_STORE_SRC : VFCOMP_STORE_0;
> > > 
> > > +   case 1:
> > > 
> > > +      return isl_format_layouts[format].channels.g.bits ?
> > > 
> > > +         VFCOMP_STORE_SRC : VFCOMP_STORE_0;
> > > 
> > > +   case 2:
> > > 
> > > +      return isl_format_layouts[format].channels.b.bits ?
> > > 
> > > +         VFCOMP_STORE_SRC :
> > > ((isl_format_layouts[format].channels.r.type == ISL_RAW) ?
> > > 
> > > +                             VFCOMP_NOSTORE : VFCOMP_STORE_0);
> > > 
> > > 
> > > Given all the line wrapping, I think it would be clearer to just
> > > have an if ladder here
> > > 
> > > 
> > > +   case 3:
> > > 
> > > +      if (isl_format_layouts[format].channels.a.bits)
> > > 
> > > +         return VFCOMP_STORE_SRC;
> > > 
> > > +      else
> > > 
> > > 
> > > Please use braces when one side of the if/else is multiple lines.
> > > 
> > > 
> > > +         switch (isl_format_layouts[format].channels.r.type) {
> > > 
> > > +         case ISL_RAW:
> > > 
> > > +            return isl_format_layouts[format].channels.b.bits ?
> > > 
> > > +               VFCOMP_STORE_0 : VFCOMP_NOSTORE;
> > > 
> > > 
> > > This seems a bit odd.  Mind explaining what's going on here?
> > > 
> > 
> > Yes. When emitting 64-bit components, we either write 128 or 256
> > bits, using VFCOMP_STORE_0 to pad output properly. We write 256
> > bits if we need to emit Blue and/or Alpha components.
> > 
> > If we only need to write 128bits then we use VFCOMP_NOSTORE for the
> > 3rd and 4th components.
> > 
> > In above code, we already know we are not emitting Alpha (the first
> > condition in "case 3" is false). If we neither need to emit Blue
> > component, then we only need to write 128 bits, so we return
> > NOSTORE.
> > 
> > But if Blue is required, then we need to write 256bits, so we
> > return VFCOMP_STORE_0 to pad the output.
> > 
> > Maybe this would be clearer:  leave it structured the way it was
> > with the "bits" temporary and add a new case to the if ladder right
> > after the first one that is
> > 
> > } else if (comp >= 2 && !isl_format_layouts[format].channels.b.bits
> > &&
> >           isl_format_layouts[format].channels.r.type == ISL_RAW) {
> >    /* comment about writing 128-bit chunks */
> >    return VFCOMP_NOSTORE;
> > 
> 
> Sounds good for me. This also requires to change the next branch to
> 
> } else if(comp < 3 || (comp == 3 &&
>                                     isl_format_layouts[format].channe
> ls.r.type == ISL_RAW)) {
>     return VFCOMP_STORE_0;
> 
> 
> Otherwise, it would return the wrong value for 4th component when
> format is R64G64B64.
> 
> Why are we defaulting the 4th component to zero for 64-bit inputs?  I
> guess the format wouldn't register as integer or float so neither of
> the other two makes sense.  What does the spec day the 4th component
> should be defaulted to?  I'm a bit concerned we may find ourselves
> needing to inspect the shader to get this 100% right. :-(

Because in the example above, we need do not have a value for the 4th
component, but we wrote 3*64 = 192 bits. And as we need to write 256,
we pad the remaining 64 bits with 0, as required by spec.

>From Broadwell spec, command reference structures, page 586:

"When SourceElementFormat is set to one of the *64*_PASSTHRU formats,
64-bit components are stored
in the URB without any conversion. In this case, vertex elements must
be written as 128 or 256 bits, with
VFCOMP_STORE_0 being used to pad the output as required. E.g., if
R64_PASSTHRU is used to copy a
64-bit Red component into the URB, Component 1 must be specified as
VFCOMP_STORE_0 (with
Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
vertex element, or Components
1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit
vertex element. Likewise, use of
R64G64B64_PASSTHRU requires Component 3 to be specified as
VFCOMP_STORE_0 in order to output a
256-bit vertex element."

Note this is not specific for the 4th component. For instance, if the
input element is a 64-bit single-component value (R64), we would be
writing 128bits: the first component is the entire Red value
(VFCOMP_STORE_SRC), which would be
using the first 64bits; and for the second component we would use 0
(VCOMP_STORE_0), to pad the remaining 64bits. In this case, as we do
not have neither 3rd nor 4th components, we would
use VFCOMP_NOSTORE, so only 128bits are written.

I also tried to test what happens when using other value rathen than 0,
and it doesn't work.

	J.A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170105/83a2ab2d/attachment-0001.html>