[Mesa-dev] [PATCH 16/36] glsl ubo/ssbo: Add lower_buffer_access class

Mon Nov 16 17:52:39 PST 2015

On 2015-11-16 04:27:55, Iago Toral wrote:
> On Sat, 2015-11-14 at 13:43 -0800, Jordan Justen wrote:
> > This class has code that will be shared by lower_ubo_reference and
> > lower_shared_reference. (lower_shared_reference will be used to
> > support compute shader shared variables.)
> > 
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > Cc: Samuel Iglesias Gonsalvez <siglesias at igalia.com>
> > Cc: Iago Toral Quiroga <itoral at igalia.com>
> > ---
> >  src/glsl/Makefile.sources        |   1 +
> >  src/glsl/lower_buffer_access.cpp | 307 +++++++++++++++++++++++++++++++++++++++
> >  src/glsl/lower_buffer_access.h   |  56 +++++++
> >  src/glsl/lower_ubo_reference.cpp | 180 +----------------------
> >  4 files changed, 367 insertions(+), 177 deletions(-)
> >  create mode 100644 src/glsl/lower_buffer_access.cpp
> >  create mode 100644 src/glsl/lower_buffer_access.h
> > 
> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> > index d4b02c1..f2c95c0 100644
> > --- a/src/glsl/Makefile.sources
> > +++ b/src/glsl/Makefile.sources
> > @@ -155,6 +155,7 @@ LIBGLSL_FILES = \
> >       loop_analysis.h \
> >       loop_controls.cpp \
> >       loop_unroll.cpp \
> > +     lower_buffer_access.cpp \
> >       lower_clip_distance.cpp \
> >       lower_const_arrays_to_uniforms.cpp \
> >       lower_discard.cpp \
> > diff --git a/src/glsl/lower_buffer_access.cpp b/src/glsl/lower_buffer_access.cpp
> > new file mode 100644
> > index 0000000..e0b5a2f
> > --- /dev/null
> > +++ b/src/glsl/lower_buffer_access.cpp
> > @@ -0,0 +1,307 @@
> > +/*
> > + * Copyright (c) 2015 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the next
> > + * paragraph) shall be included in all copies or substantial portions of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +/**
> > + * \file lower_buffer_access.cpp
> > + *
> > + * Helper for IR lowering pass to replace dereferences of buffer object based
> > + * shader variables with intrinsic function calls.
> > + *
> > + * This helper is used by lowering passes for UBOs, SSBOs and compute shader
> > + * shared variables.
> > + */
> > +
> > +#include "ir.h"
> > +#include "ir_builder.h"
> > +#include "ir_rvalue_visitor.h"
> > +#include "main/macros.h"
> > +#include "util/list.h"
> > +#include "glsl_parser_extras.h"
> > +#include "lower_buffer_access.h"
> > +
> > +using namespace ir_builder;
> > +
> > +namespace lower_buffer_access {
> > +
> > +static inline int
> > +writemask_for_size(unsigned n)
> > +{
> > +   return ((1 << n) - 1);
> > +}
> > +
> > +/**
> > + * Takes LHS and emits a series of assignments into its components
> > + * from the shared variable storage.
> 
> I find this part of the comment a bit confusing. This function breaks a
> dereference access into one or multiple accesses to the underlying
> buffer storage. Such dereference could be in a RHS expression, and in
> fact, that will always be the case for UBO and SSBO loads.

Hmm. I may have copied this comment from lower_ubo_reference some time
back. Anyway, I intended to use the current comment from
lower_ubo_reference:

/**
 * Takes a deref and recursively calls itself to break the deref down to the
 * point that the reads or writes generated are contiguous scalars or vectors.
 */

> > + * Recursively calls itself to break the deref down to the point that
> > + * the intrinsic calls are generated.
> > + */
> > +void
> > +lower_buffer_access::emit_access(bool is_write,
> > +                                 ir_dereference *deref,
> > +                                 ir_variable *base_offset,
> > +                                 unsigned int deref_offset,
> > +                                 bool row_major,
> > +                                 int matrix_columns,
> > +                                 unsigned int packing,
> > +                                 unsigned int write_mask)
> > +{
> 
> Why not pass mem_ctx as parameter instead of having it be a class
> member? I find it a bit odd that this class defines mem_ctx but never
> really takes care of initializing it, expecting that subclasses do that
> for it, so in that case why not just make them actually take care of
> passing the mem_ctx to use instead?
> 
> If you rather keep mem_ctx defined here I'd at least suggest to add an
> assert to the functions that use it to check that it has indeed been
> initialized by the subclass.

I think your comment applies to the current code in
lower_ubo_reference as well. It resets mem_ctx at various points. I
will try to get rid of mem_ctx as a member variable in all the related
classes and add it as a parameter instead.

Thanks,

-Jordan

> > +   if (deref->type->is_record()) {
> > +      unsigned int field_offset = 0;
> > +
> > +      for (unsigned i = 0; i < deref->type->length; i++) {
> > +         const struct glsl_struct_field *field =
> > +            &deref->type->fields.structure[i];
> > +         ir_dereference *field_deref =
> > +            new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, NULL),
> > +                                               field->name);
> > +
> > +         field_offset =
> > +            glsl_align(field_offset,
> > +                       field->type->std140_base_alignment(row_major));
> > +
> > +         emit_access(is_write, field_deref, base_offset,
> > +                     deref_offset + field_offset,
> > +                     row_major, 1, packing,
> > +                     writemask_for_size(field_deref->type->vector_elements));
> > +
> > +         field_offset += field->type->std140_size(row_major);
> > +      }
> > +      return;
> > +   }
> > +
> > +   if (deref->type->is_array()) {
> > +      unsigned array_stride = packing == GLSL_INTERFACE_PACKING_STD430 ?
> > +         deref->type->fields.array->std430_array_stride(row_major) :
> > +         glsl_align(deref->type->fields.array->std140_size(row_major), 16);
> > +
> > +      for (unsigned i = 0; i < deref->type->length; i++) {
> > +         ir_constant *element = new(mem_ctx) ir_constant(i);
> > +         ir_dereference *element_deref =
> > +            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL),
> > +                                              element);
> > +         emit_access(is_write, element_deref, base_offset,
> > +                     deref_offset + i * array_stride,
> > +                     row_major, 1, packing,
> > +                     writemask_for_size(element_deref->type->vector_elements));
> > +      }
> > +      return;
> > +   }
> > +
> > +   if (deref->type->is_matrix()) {
> > +      for (unsigned i = 0; i < deref->type->matrix_columns; i++) {
> > +         ir_constant *col = new(mem_ctx) ir_constant(i);
> > +         ir_dereference *col_deref =
> > +            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL), col);
> > +
> > +         if (row_major) {
> > +            /* For a row-major matrix, the next column starts at the next
> > +             * element.
> > +             */
> > +            int size_mul = deref->type->is_double() ? 8 : 4;
> > +            emit_access(is_write, col_deref, base_offset,
> > +                        deref_offset + i * size_mul,
> > +                        row_major, deref->type->matrix_columns, packing,
> > +                        writemask_for_size(col_deref->type->vector_elements));
> > +         } else {
> > +            int size_mul;
> > +
> > +            /* std430 doesn't round up vec2 size to a vec4 size */
> > +            if (packing == GLSL_INTERFACE_PACKING_STD430 &&
> > +                deref->type->vector_elements == 2 &&
> > +                !deref->type->is_double()) {
> > +               size_mul = 8;
> > +            } else {
> > +               /* std140 always rounds the stride of arrays (and matrices) to a
> > +                * vec4, so matrices are always 16 between columns/rows. With
> > +                * doubles, they will be 32 apart when there are more than 2 rows.
> > +                *
> > +                * For both std140 and std430, if the member is a
> > +                * three-'component vector with components consuming N basic
> > +                * machine units, the base alignment is 4N. For vec4, base
> > +                * alignment is 4N.
> > +                */
> > +               size_mul = (deref->type->is_double() &&
> > +                           deref->type->vector_elements > 2) ? 32 : 16;
> > +            }
> > +
> > +            emit_access(is_write, col_deref, base_offset,
> > +                        deref_offset + i * size_mul,
> > +                        row_major, deref->type->matrix_columns, packing,
> > +                        writemask_for_size(col_deref->type->vector_elements));
> > +         }
> > +      }
> > +      return;
> > +   }
> > +
> > +   assert(deref->type->is_scalar() || deref->type->is_vector());
> > +
> > +   if (!row_major) {
> > +      ir_rvalue *offset =
> > +         add(base_offset, new(mem_ctx) ir_constant(deref_offset));
> > +      unsigned mask =
> > +         is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
> > +      insert_buffer_access(deref, deref->type, offset, mask, -1);
> > +   } else {
> > +      unsigned N = deref->type->is_double() ? 8 : 4;
> > +
> > +      /* We're dereffing a column out of a row-major matrix, so we
> > +       * gather the vector from each stored row.
> > +      */
> > +      assert(deref->type->base_type == GLSL_TYPE_FLOAT ||
> > +             deref->type->base_type == GLSL_TYPE_DOUBLE);
> > +      /* Matrices, row_major or not, are stored as if they were
> > +       * arrays of vectors of the appropriate size in std140.
> > +       * Arrays have their strides rounded up to a vec4, so the
> > +       * matrix stride is always 16. However a double matrix may either be 16
> > +       * or 32 depending on the number of columns.
> > +       */
> > +      assert(matrix_columns <= 4);
> > +      unsigned matrix_stride = 0;
> > +      /* Matrix stride for std430 mat2xY matrices are not rounded up to
> > +       * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform
> > +       * Block Layout":
> > +       *
> > +       * "2. If the member is a two- or four-component vector with components
> > +       * consuming N basic machine units, the base alignment is 2N or 4N,
> > +       * respectively." [...]
> > +       * "4. If the member is an array of scalars or vectors, the base alignment
> > +       * and array stride are set to match the base alignment of a single array
> > +       * element, according to rules (1), (2), and (3), and rounded up to the
> > +       * base alignment of a vec4." [...]
> > +       * "7. If the member is a row-major matrix with C columns and R rows, the
> > +       * matrix is stored identically to an array of R row vectors with C
> > +       * components each, according to rule (4)." [...]
> > +       * "When using the std430 storage layout, shader storage blocks will be
> > +       * laid out in buffer storage identically to uniform and shader storage
> > +       * blocks using the std140 layout, except that the base alignment and
> > +       * stride of arrays of scalars and vectors in rule 4 and of structures in
> > +       * rule 9 are not rounded up a multiple of the base alignment of a vec4."
> > +       */
> > +      if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 2)
> > +         matrix_stride = 2 * N;
> > +      else
> > +         matrix_stride = glsl_align(matrix_columns * N, 16);
> > +
> > +      const glsl_type *deref_type = deref->type->base_type == GLSL_TYPE_FLOAT ?
> > +         glsl_type::float_type : glsl_type::double_type;
> > +
> > +      for (unsigned i = 0; i < deref->type->vector_elements; i++) {
> > +         ir_rvalue *chan_offset =
> > +            add(base_offset,
> > +                new(mem_ctx) ir_constant(deref_offset + i * matrix_stride));
> > +         if (!is_write || ((1U << i) & write_mask))
> > +            insert_buffer_access(deref, deref_type, chan_offset, (1U << i), i);
> > +      }
> > +   }
> > +}
> > +
> > +/**
> > + * Determine if a thing being dereferenced is row-major
> > + *
> > + * There is some trickery here.
> > + *
> > + * If the thing being dereferenced is a member of uniform block \b without an
> > + * instance name, then the name of the \c ir_variable is the field name of an
> > + * interface type.  If this field is row-major, then the thing referenced is
> > + * row-major.
> > + *
> > + * If the thing being dereferenced is a member of uniform block \b with an
> > + * instance name, then the last dereference in the tree will be an
> > + * \c ir_dereference_record.  If that record field is row-major, then the
> > + * thing referenced is row-major.
> > + */
> > +static bool
> > +is_dereferenced_thing_row_major(const ir_dereference *deref)
> > +{
> > +   bool matrix = false;
> > +   const ir_rvalue *ir = deref;
> > +
> > +   while (true) {
> > +      matrix = matrix || ir->type->without_array()->is_matrix();
> > +
> > +      switch (ir->ir_type) {
> > +      case ir_type_dereference_array: {
> > +         const ir_dereference_array *const array_deref =
> > +            (const ir_dereference_array *) ir;
> > +
> > +         ir = array_deref->array;
> > +         break;
> > +      }
> > +
> > +      case ir_type_dereference_record: {
> > +         const ir_dereference_record *const record_deref =
> > +            (const ir_dereference_record *) ir;
> > +
> > +         ir = record_deref->record;
> > +
> > +         const int idx = ir->type->field_index(record_deref->field);
> > +         assert(idx >= 0);
> > +
> > +         const enum glsl_matrix_layout matrix_layout =
> > +            glsl_matrix_layout(ir->type->fields.structure[idx].matrix_layout);
> > +
> > +         switch (matrix_layout) {
> > +         case GLSL_MATRIX_LAYOUT_INHERITED:
> > +            break;
> > +         case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
> > +            return false;
> > +         case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
> > +            return matrix || deref->type->without_array()->is_record();
> > +         }
> > +
> > +         break;
> > +      }
> > +
> > +      case ir_type_dereference_variable: {
> > +         const ir_dereference_variable *const var_deref =
> > +            (const ir_dereference_variable *) ir;
> > +
> > +         const enum glsl_matrix_layout matrix_layout =
> > +            glsl_matrix_layout(var_deref->var->data.matrix_layout);
> > +
> > +         switch (matrix_layout) {
> > +         case GLSL_MATRIX_LAYOUT_INHERITED:
> > +         case GLSL_MATRIX_LAYOUT_COLUMN_MAJOR:
> > +            return false;
> > +         case GLSL_MATRIX_LAYOUT_ROW_MAJOR:
> > +            return matrix || deref->type->without_array()->is_record();
> > +         }
> > +
> > +         unreachable("invalid matrix layout");
> > +         break;
> > +      }
> > +
> > +      default:
> > +         return false;
> > +      }
> > +   }
> > +
> > +   /* The tree must have ended with a dereference that wasn't an
> > +    * ir_dereference_variable.  That is invalid, and it should be impossible.
> > +    */
> > +   unreachable("invalid dereference tree");
> > +   return false;
> > +}
> > +
> > +} /* namespace lower_buffer_access */
> > diff --git a/src/glsl/lower_buffer_access.h b/src/glsl/lower_buffer_access.h
> > new file mode 100644
> > index 0000000..3138963
> > --- /dev/null
> > +++ b/src/glsl/lower_buffer_access.h
> > @@ -0,0 +1,56 @@
> > +/*
> > + * Copyright (c) 2015 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the next
> > + * paragraph) shall be included in all copies or substantial portions of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +/**
> > + * \file lower_buffer_access.h
> > + *
> > + * Helper for IR lowering pass to replace dereferences of buffer object based
> > + * shader variables with intrinsic function calls.
> > + *
> > + * This helper is used by lowering passes for UBOs, SSBOs and compute shader
> > + * shared variables.
> > + */
> > +
> > +#pragma once
> > +#ifndef LOWER_BUFFER_ACCESS_H
> > +#define LOWER_BUFFER_ACCESS_H
> > +
> > +namespace lower_buffer_access {
> > +
> > +class lower_buffer_access : public ir_rvalue_enter_visitor {
> > +public:
> > +   virtual void
> > +   insert_buffer_access(ir_dereference *deref, const glsl_type *type,
> > +                        ir_rvalue *offset, unsigned mask, int channel) = 0;
> > +
> > +   void emit_access(bool is_write, ir_dereference *deref,
> > +                    ir_variable *base_offset, unsigned int deref_offset,
> > +                    bool row_major, int matrix_columns,
> > +                    unsigned int packing, unsigned int write_mask);
> > +
> > +   void *mem_ctx;
> > +};
> > +
> > +} /* namespace lower_buffer_access */
> > +
> > +#endif /* LOWER_BUFFER_ACCESS_H */
> > diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
> > index b8fcc8e..8de4f5e 100644
> > --- a/src/glsl/lower_ubo_reference.cpp
> > +++ b/src/glsl/lower_ubo_reference.cpp
> > @@ -38,6 +38,7 @@
> >  #include "ir_rvalue_visitor.h"
> >  #include "main/macros.h"
> >  #include "glsl_parser_extras.h"
> > +#include "lower_buffer_access.h"
> >  
> >  using namespace ir_builder;
> >  
> > @@ -132,7 +133,8 @@ is_dereferenced_thing_row_major(const ir_rvalue *deref)
> >  }
> >  
> >  namespace {
> > -class lower_ubo_reference_visitor : public ir_rvalue_enter_visitor {
> > +class lower_ubo_reference_visitor :
> > +      public lower_buffer_access::lower_buffer_access {
> >  public:
> >     lower_ubo_reference_visitor(struct gl_shader *shader)
> >     : shader(shader)
> > @@ -173,11 +175,6 @@ public:
> >     void insert_buffer_access(ir_dereference *deref, const glsl_type *type,
> >                               ir_rvalue *offset, unsigned mask, int channel);
> >  
> > -   void emit_access(bool is_write, ir_dereference *deref,
> > -                    ir_variable *base_offset, unsigned int deref_offset,
> > -                    bool row_major, int matrix_columns,
> > -                    unsigned packing, unsigned write_mask);
> > -
> >     ir_visitor_status visit_enter(class ir_expression *);
> >     ir_expression *calculate_ssbo_unsized_array_length(ir_expression *expr);
> >     void check_ssbo_unsized_array_length_expression(class ir_expression *);
> > @@ -195,7 +192,6 @@ public:
> >     ir_call *check_for_ssbo_atomic_intrinsic(ir_call *ir);
> >     ir_visitor_status visit_enter(ir_call *ir);
> >  
> > -   void *mem_ctx;
> >     struct gl_shader *shader;
> >     struct gl_uniform_buffer_variable *ubo_var;
> >     ir_rvalue *uniform_block;
> > @@ -727,176 +723,6 @@ lower_ubo_reference_visitor::insert_buffer_access(ir_dereference *deref,
> >     }
> >  }
> >  
> > -static inline int
> > -writemask_for_size(unsigned n)
> > -{
> > -   return ((1 << n) - 1);
> > -}
> > -
> > -/**
> > - * Takes a deref and recursively calls itself to break the deref down to the
> > - * point that the reads or writes generated are contiguous scalars or vectors.
> > - */
> > -void
> > -lower_ubo_reference_visitor::emit_access(bool is_write,
> > -                                         ir_dereference *deref,
> > -                                         ir_variable *base_offset,
> > -                                         unsigned int deref_offset,
> > -                                         bool row_major,
> > -                                         int matrix_columns,
> > -                                         unsigned packing,
> > -                                         unsigned write_mask)
> > -{
> > -   if (deref->type->is_record()) {
> > -      unsigned int field_offset = 0;
> > -
> > -      for (unsigned i = 0; i < deref->type->length; i++) {
> > -         const struct glsl_struct_field *field =
> > -            &deref->type->fields.structure[i];
> > -         ir_dereference *field_deref =
> > -            new(mem_ctx) ir_dereference_record(deref->clone(mem_ctx, NULL),
> > -                                               field->name);
> > -
> > -         field_offset =
> > -            glsl_align(field_offset,
> > -                       field->type->std140_base_alignment(row_major));
> > -
> > -         emit_access(is_write, field_deref, base_offset,
> > -                     deref_offset + field_offset,
> > -                     row_major, 1, packing,
> > -                     writemask_for_size(field_deref->type->vector_elements));
> > -
> > -         field_offset += field->type->std140_size(row_major);
> > -      }
> > -      return;
> > -   }
> > -
> > -   if (deref->type->is_array()) {
> > -      unsigned array_stride = packing == GLSL_INTERFACE_PACKING_STD430 ?
> > -         deref->type->fields.array->std430_array_stride(row_major) :
> > -         glsl_align(deref->type->fields.array->std140_size(row_major), 16);
> > -
> > -      for (unsigned i = 0; i < deref->type->length; i++) {
> > -         ir_constant *element = new(mem_ctx) ir_constant(i);
> > -         ir_dereference *element_deref =
> > -            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL),
> > -                                              element);
> > -         emit_access(is_write, element_deref, base_offset,
> > -                     deref_offset + i * array_stride,
> > -                     row_major, 1, packing,
> > -                     writemask_for_size(element_deref->type->vector_elements));
> > -      }
> > -      return;
> > -   }
> > -
> > -   if (deref->type->is_matrix()) {
> > -      for (unsigned i = 0; i < deref->type->matrix_columns; i++) {
> > -         ir_constant *col = new(mem_ctx) ir_constant(i);
> > -         ir_dereference *col_deref =
> > -            new(mem_ctx) ir_dereference_array(deref->clone(mem_ctx, NULL), col);
> > -
> > -         if (row_major) {
> > -            /* For a row-major matrix, the next column starts at the next
> > -             * element.
> > -             */
> > -            int size_mul = deref->type->is_double() ? 8 : 4;
> > -            emit_access(is_write, col_deref, base_offset,
> > -                        deref_offset + i * size_mul,
> > -                        row_major, deref->type->matrix_columns, packing,
> > -                        writemask_for_size(col_deref->type->vector_elements));
> > -         } else {
> > -            int size_mul;
> > -
> > -            /* std430 doesn't round up vec2 size to a vec4 size */
> > -            if (packing == GLSL_INTERFACE_PACKING_STD430 &&
> > -                deref->type->vector_elements == 2 &&
> > -                !deref->type->is_double()) {
> > -               size_mul = 8;
> > -            } else {
> > -               /* std140 always rounds the stride of arrays (and matrices) to a
> > -                * vec4, so matrices are always 16 between columns/rows. With
> > -                * doubles, they will be 32 apart when there are more than 2 rows.
> > -                *
> > -                * For both std140 and std430, if the member is a
> > -                * three-'component vector with components consuming N basic
> > -                * machine units, the base alignment is 4N. For vec4, base
> > -                * alignment is 4N.
> > -                */
> > -               size_mul = (deref->type->is_double() &&
> > -                           deref->type->vector_elements > 2) ? 32 : 16;
> > -            }
> > -
> > -            emit_access(is_write, col_deref, base_offset,
> > -                        deref_offset + i * size_mul,
> > -                        row_major, deref->type->matrix_columns, packing,
> > -                        writemask_for_size(col_deref->type->vector_elements));
> > -         }
> > -      }
> > -      return;
> > -   }
> > -
> > -   assert(deref->type->is_scalar() || deref->type->is_vector());
> > -
> > -   if (!row_major) {
> > -      ir_rvalue *offset =
> > -         add(base_offset, new(mem_ctx) ir_constant(deref_offset));
> > -      unsigned mask =
> > -         is_write ? write_mask : (1 << deref->type->vector_elements) - 1;
> > -      insert_buffer_access(deref, deref->type, offset, mask, -1);
> > -   } else {
> > -      unsigned N = deref->type->is_double() ? 8 : 4;
> > -
> > -      /* We're dereffing a column out of a row-major matrix, so we
> > -       * gather the vector from each stored row.
> > -      */
> > -      assert(deref->type->base_type == GLSL_TYPE_FLOAT ||
> > -             deref->type->base_type == GLSL_TYPE_DOUBLE);
> > -      /* Matrices, row_major or not, are stored as if they were
> > -       * arrays of vectors of the appropriate size in std140.
> > -       * Arrays have their strides rounded up to a vec4, so the
> > -       * matrix stride is always 16. However a double matrix may either be 16
> > -       * or 32 depending on the number of columns.
> > -       */
> > -      assert(matrix_columns <= 4);
> > -      unsigned matrix_stride = 0;
> > -      /* Matrix stride for std430 mat2xY matrices are not rounded up to
> > -       * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform
> > -       * Block Layout":
> > -       *
> > -       * "2. If the member is a two- or four-component vector with components
> > -       * consuming N basic machine units, the base alignment is 2N or 4N,
> > -       * respectively." [...]
> > -       * "4. If the member is an array of scalars or vectors, the base alignment
> > -       * and array stride are set to match the base alignment of a single array
> > -       * element, according to rules (1), (2), and (3), and rounded up to the
> > -       * base alignment of a vec4." [...]
> > -       * "7. If the member is a row-major matrix with C columns and R rows, the
> > -       * matrix is stored identically to an array of R row vectors with C
> > -       * components each, according to rule (4)." [...]
> > -       * "When using the std430 storage layout, shader storage blocks will be
> > -       * laid out in buffer storage identically to uniform and shader storage
> > -       * blocks using the std140 layout, except that the base alignment and
> > -       * stride of arrays of scalars and vectors in rule 4 and of structures in
> > -       * rule 9 are not rounded up a multiple of the base alignment of a vec4."
> > -       */
> > -      if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 2)
> > -         matrix_stride = 2 * N;
> > -      else
> > -         matrix_stride = glsl_align(matrix_columns * N, 16);
> > -
> > -      const glsl_type *deref_type = deref->type->base_type == GLSL_TYPE_FLOAT ?
> > -         glsl_type::float_type : glsl_type::double_type;
> > -
> > -      for (unsigned i = 0; i < deref->type->vector_elements; i++) {
> > -         ir_rvalue *chan_offset =
> > -            add(base_offset,
> > -                new(mem_ctx) ir_constant(deref_offset + i * matrix_stride));
> > -         if (!is_write || ((1U << i) & write_mask))
> > -            insert_buffer_access(deref, deref_type, chan_offset, (1U << i), i);
> > -      }
> > -   }
> > -}
> > -
> >  void
> >  lower_ubo_reference_visitor::write_to_memory(ir_dereference *deref,
> >                                               ir_variable *var,
> 
>