[Mesa-dev] [PATCH 1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits

Mon Feb 26 15:57:50 UTC 2018

On Mon, Feb 26, 2018 at 6:10 AM, Chema Casanova <jmcasanova at igalia.com>
wrote:

> El 23/02/18 a las 22:31, Jason Ekstrand escribió:
> > On Fri, Feb 23, 2018 at 12:28 PM, Chema Casanova <jmcasanova at igalia.com
> > <mailto:jmcasanova at igalia.com>> wrote:
> >
> >
> >
> >     El 23/02/18 a las 17:26, Jason Ekstrand escribió:
> >     > On Fri, Feb 23, 2018 at 1:26 AM, Jose Maria Casanova Crespo
> >     > <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>
> >     <mailto:jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>>>
> wrote:
> >     >
> >     >     The surfaces that backup the GPU buffers have a boundary check
> >     that
> >     >     considers that access to partial dwords are considered
> >     out-of-bounds.
> >     >     For example is basic 16-bit cases of buffers with size 2 or 6
> >     where the
> >     >     last two bytes will always be read as 0 or its writting
> ignored.
> >     >
> >     >     The introduction of 16-bit types implies that we need to align
> >     the size
> >     >     to 4-bytes multiples so that partial dwords could be
> read/written.
> >     >     Adding an inconditional +2 size to buffers not being multiple
> of 2
> >     >     solves this issue for the general cases of UBO or SSBO.
> >     >
> >     >     But, when unsized_arrays of 16-bit elements are used it is not
> >     possible
> >     >     to know if the size was padded or not. To solve this issue the
> >     >     implementation of SSBO calculates the needed size of the
> surface,
> >     >     as suggested by Jason:
> >     >
> >     >     surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
> >     >
> >     >     So when we calculate backwards the SpvOpArrayLenght with a nir
> >     expresion
> >     >     when the array stride is not multiple of 4.
> >     >
> >     >     array_size = (surface_size & ~3) - (surface_size & 3)
> >     >
> >     >     It is also exposed this buffer requirements when robust buffer
> >     access
> >     >     is enabled so these buffer sizes recommend being multiple of 4.
> >     >     ---
> >     >
> >     >     I have some doubts if vtn_variables.c is the best place to
> include
> >     >     this specific to calculate the real buffer size as this is new
> >     >     calculus seems to be quite HW dependent and maybe other drivers
> >     >     different
> >     >     to ANV don't need this kind of solution.
> >     >
> >     >      src/compiler/spirv/vtn_variables.c    | 14 ++++++++++++++
> >     >      src/intel/vulkan/anv_descriptor_set.c | 16 ++++++++++++++++
> >     >      src/intel/vulkan/anv_device.c         | 11 +++++++++++
> >     >      3 files changed, 41 insertions(+)
> >     >
> >     >     diff --git a/src/compiler/spirv/vtn_variables.c
> >     >     b/src/compiler/spirv/vtn_variables.c
> >     >     index 9eb85c24e9..78adab3ed2 100644
> >     >     --- a/src/compiler/spirv/vtn_variables.c
> >     >     +++ b/src/compiler/spirv/vtn_variables.c
> >     >     @@ -2113,6 +2113,20 @@ vtn_handle_variables(struct vtn_builder
> *b,
> >     >     SpvOp opcode,
> >     >            nir_builder_instr_insert(&b->nb, &instr->instr);
> >     >            nir_ssa_def *buf_size = &instr->dest.ssa;
> >     >
> >     >     +      /* Calculate real length if padding was done to align
> >     the buffer
> >     >     +       * to 32-bits. This only could happen is stride is not
> >     multiple
> >     >     +       * of 4. Introduced to support 16-bit type unsized
> >     arrays in anv.
> >     >     +       */
> >     >     +      if (stride % 4) {
> >     >     +         buf_size = nir_isub(&b->nb,
> >     >     +                             nir_iand(&b->nb,
> >     >     +                                      buf_size,
> >     >     +                                      nir_imm_int(&b->nb,
> ~3)),
> >     >     +                             nir_iand (&b->nb,
> >     >     +                                       buf_size,
> >     >     +                                       nir_imm_int(&b->nb,
> 3)));
> >     >
> >     >
> >     > We can't do this in spirv_to_nir as it's also used by radv and
> >     they may
> >     > not have the same issue.  Instead, we need to handle it either in
> >     > anv_nir_apply_pipeline_layout or in the back-end compiler.  Doing
> it
> >     > here has the advantage that we can only do it in the "stride % 4
> != 0"
> >     > case but I don't think the three instructions are all that big of
> >     a deal
> >     > given that we just did a send and are about to do an integer
> >     divide.  My
> >     > preference would be to put most of it in ISL and the back-end
> compiler
> >     > if we can.
> >
> >     I've already had my doubts in my commit comment. So I'll implement it
> >     properly in the backend implementation of
> nir_intrinsic_get_buffer_size.
> >     I should have a look to that code before.
>

Vulkan does not allow unsized arrays in UBOs.  However, if we have to pad
it, I'd rather us just do the same thing for both.

> >     >
> >     >     +      }
> >     >     +
> >     >            /* array_length = max(buffer_size - offset, 0) / stride
> */
> >     >            nir_ssa_def *array_length =
> >     >               nir_idiv(&b->nb,
> >     >     diff --git a/src/intel/vulkan/anv_descriptor_set.c
> >     >     b/src/intel/vulkan/anv_descriptor_set.c
> >     >     index edb829601e..a97f2f37dc 100644
> >     >     --- a/src/intel/vulkan/anv_descriptor_set.c
> >     >     +++ b/src/intel/vulkan/anv_descriptor_set.c
> >     >     @@ -704,6 +704,22 @@ anv_descriptor_set_write_buffer(struct
> >     >     anv_descriptor_set *set,
> >     >            bview->offset = buffer->offset + offset;
> >     >            bview->range = anv_buffer_get_range(buffer, offset,
> range);
> >     >
> >     >     +      /* Uniform and Storage buffers need to have surface size
> >     >     +       * not less that the aligned 32-bit size of the buffer.
> >     >     +       * To calculate the array lenght on unsized arrays
> >     >     +       * in StorageBuffer the last 2 bits store the padding
> size
> >     >     +       * added to the surface, so we can calculate latter the
> >     original
> >     >     +       * buffer size to know the number of elements.
> >     >     +       *
> >     >     +       *  surface_size = 2 * aling_u64(buffer_size, 4)  -
> >     buffer_size
> >     >     +       *
> >     >     +       *  array_size = (surface_size & ~3) - (surface_size &
> 3)
> >     >     +       */
> >     >     +      if (type == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER)
> >     >     +         bview->range = 2 * align_u64(bview->range, 4) -
> >     bview->range;
> >     >     +      else if (type == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER)
> >     >     +         bview->range = align_u64(bview->range, 4);
> >
> >     > We chatted on another e-mail about doing this in ISL instead of
> >     here but
> >     > you mentioned issues with failing GL tests.  Did you ever get to
> the
> >     > bottom of those?
> >
> >     Yes I've already implemented that way but it was originally creating
> 10
> >     regressions in Jenkins on DEQP in tests like
> >     dEQP-GLES31.functional.texture.texture_buffer.modify.buffer
> data.buffer_size_131071
> >
> >
> >     Although it was not possible to reproduce it locally (it was a kind
> of
> >     poltergeist)
> >
> >
> > I hate it when that happens.
> >
> >
> >     it was consistent as I was messing with the texture size of
> >     the buffer.
> >
> >     So the solution was add extra restrictions to the size alignment
> padding
> >     to detect UniformBuffers only when next 3 conditions matched:
> >     - isl_format was ISL_FORMAT_R32G32B32A32_FLOAT
> >     - stride = 1
> >     - size % 4 != 0
> >
> >
> > I think that mechanism is more-or-less fine.  You could also use a
> > condition like this:
> >
> > if (format == ISL_FORMAT_RAW ||
> >     stride < isl_format_get_layout(format)->bpb / 8) {
> >    assert(stride == 1);
> >    /* Do the math */
> > }
>
> > That would enable it only for UBOs SSBOs and storage texel buffers where
> > we don't have a matching typed format.  Also, it would prevent us from
> > changing num_elements in any case other than stride == 1.
>
>
> I'm sending an v2 now but I had another doubt with your code proposal. I
> initially differentiated the UBO and SSBO case so that for SSBO we do
> the math to solve the issue with unsized arrays, but for UBO i was just
> aligning the the size to 32-bits, just to avoid getting to the next dword.
>
> Would it be better to have do generally the SSBO math?
>
> That way the buffer_size intrinsic will be always return the right value.
>
>
> >     But when working testing this solution I realized that for Storage
> >     Buffers there was a another entry point where I couldn't
> differentiate
> >     from an Storage Buffer at anv_image.c  for buffer usage of
> >     VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT. at is uses stride = 1 and
> >     ISL_RAW format.
> >
> >     So in order to avoid messing with the sizes in that case I though it
> was
> >     better to move the padding decision to the place I could
> differentiate
> >     it correctly. Maybe it was an unfounded doubt and it wouldn't mind to
> >     modify the size in that VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT
> case.
> >
> >     If so, what was the conclusion?
> >
> >     So after re-thinking aloud, and if my doubts with
> >     VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT is unfounded I think I'll
> >     recover ISL approach patch, as this would be needed any case for
> mediump
> >     support.
> >
> >
> > I wouldn't worry about those.  The only time we will use SURFTYPE_RAW
> > for storage texel buffers is when we don't have a matching typed format
> > which only happens for formats which are at least 32bit and so the size
> > will be a multiple of 4 anyway.
>
> Something less to worry about :)
>
> Chema
>
> >
> >
> >     >     +
> >     >            /* If we're writing descriptors through a push command,
> we
> >     >     need to
> >     >             * allocate the surface state from the command buffer.
> >     >     Otherwise it will
> >     >             * be allocated by the descriptor pool when calling
> >     >     diff --git a/src/intel/vulkan/anv_device.c
> >     >     b/src/intel/vulkan/anv_device.c
> >     >     index a83b7a39f6..cedeed5621 100644
> >     >     --- a/src/intel/vulkan/anv_device.c
> >     >     +++ b/src/intel/vulkan/anv_device.c
> >     >     @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
> >     >
> >     >         pMemoryRequirements->size = buffer->size;
> >     >         pMemoryRequirements->alignment = alignment;
> >     >     +
> >     >     +   /* Storage and Uniform buffers should have their size
> >     aligned to
> >     >     +    * 32-bits to avoid boundary checks when last DWord is not
> >     complete.
> >     >     +    * This would ensure that not internal padding would be
> >     needed for
> >     >     +    * 16-bit types.
> >     >     +    */
> >     >     +   if (device->robust_buffer_access &&
> >     >     +       (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
> >     >     +        buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
> >     >     +      pMemoryRequirements->size = align_u64(buffer->size, 4);
> >     >     +
> >     >         pMemoryRequirements->memoryTypeBits = memory_types;
> >     >      }
> >     >
> >     >     --
> >     >     2.14.3
> >     >
> >     >     _______________________________________________
> >     >     mesa-dev mailing list
> >     >     mesa-dev at lists.freedesktop.org
> >     <mailto:mesa-dev at lists.freedesktop.org>
> >     <mailto:mesa-dev at lists.freedesktop.org
> >     <mailto:mesa-dev at lists.freedesktop.org>>
> >     >     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> >     >     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
> >     >
> >     >
> >
> >
> >
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180226/0eba9632/attachment-0001.html>