[Mesa-dev] [PATCH 1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits

Fri Feb 23 20:28:50 UTC 2018

El 23/02/18 a las 17:26, Jason Ekstrand escribió:
> On Fri, Feb 23, 2018 at 1:26 AM, Jose Maria Casanova Crespo
> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> 
>     The surfaces that backup the GPU buffers have a boundary check that
>     considers that access to partial dwords are considered out-of-bounds.
>     For example is basic 16-bit cases of buffers with size 2 or 6 where the
>     last two bytes will always be read as 0 or its writting ignored.
> 
>     The introduction of 16-bit types implies that we need to align the size
>     to 4-bytes multiples so that partial dwords could be read/written.
>     Adding an inconditional +2 size to buffers not being multiple of 2
>     solves this issue for the general cases of UBO or SSBO.
> 
>     But, when unsized_arrays of 16-bit elements are used it is not possible
>     to know if the size was padded or not. To solve this issue the
>     implementation of SSBO calculates the needed size of the surface,
>     as suggested by Jason:
> 
>     surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
> 
>     So when we calculate backwards the SpvOpArrayLenght with a nir expresion
>     when the array stride is not multiple of 4.
> 
>     array_size = (surface_size & ~3) - (surface_size & 3)
> 
>     It is also exposed this buffer requirements when robust buffer access
>     is enabled so these buffer sizes recommend being multiple of 4.
>     ---
> 
>     I have some doubts if vtn_variables.c is the best place to include
>     this specific to calculate the real buffer size as this is new
>     calculus seems to be quite HW dependent and maybe other drivers
>     different
>     to ANV don't need this kind of solution.
> 
>      src/compiler/spirv/vtn_variables.c    | 14 ++++++++++++++
>      src/intel/vulkan/anv_descriptor_set.c | 16 ++++++++++++++++
>      src/intel/vulkan/anv_device.c         | 11 +++++++++++
>      3 files changed, 41 insertions(+)
> 
>     diff --git a/src/compiler/spirv/vtn_variables.c
>     b/src/compiler/spirv/vtn_variables.c
>     index 9eb85c24e9..78adab3ed2 100644
>     --- a/src/compiler/spirv/vtn_variables.c
>     +++ b/src/compiler/spirv/vtn_variables.c
>     @@ -2113,6 +2113,20 @@ vtn_handle_variables(struct vtn_builder *b,
>     SpvOp opcode,
>            nir_builder_instr_insert(&b->nb, &instr->instr);
>            nir_ssa_def *buf_size = &instr->dest.ssa;
> 
>     +      /* Calculate real length if padding was done to align the buffer
>     +       * to 32-bits. This only could happen is stride is not multiple
>     +       * of 4. Introduced to support 16-bit type unsized arrays in anv.
>     +       */
>     +      if (stride % 4) {
>     +         buf_size = nir_isub(&b->nb,
>     +                             nir_iand(&b->nb,
>     +                                      buf_size,
>     +                                      nir_imm_int(&b->nb, ~3)),
>     +                             nir_iand (&b->nb,
>     +                                       buf_size,
>     +                                       nir_imm_int(&b->nb, 3)));
> 
> 
> We can't do this in spirv_to_nir as it's also used by radv and they may
> not have the same issue.  Instead, we need to handle it either in
> anv_nir_apply_pipeline_layout or in the back-end compiler.  Doing it
> here has the advantage that we can only do it in the "stride % 4 != 0"
> case but I don't think the three instructions are all that big of a deal
> given that we just did a send and are about to do an integer divide.  My
> preference would be to put most of it in ISL and the back-end compiler
> if we can.

I've already had my doubts in my commit comment. So I'll implement it
properly in the backend implementation of nir_intrinsic_get_buffer_size.
I should have a look to that code before.

> 
>     +      }
>     +
>            /* array_length = max(buffer_size - offset, 0) / stride */
>            nir_ssa_def *array_length =
>               nir_idiv(&b->nb,
>     diff --git a/src/intel/vulkan/anv_descriptor_set.c
>     b/src/intel/vulkan/anv_descriptor_set.c
>     index edb829601e..a97f2f37dc 100644
>     --- a/src/intel/vulkan/anv_descriptor_set.c
>     +++ b/src/intel/vulkan/anv_descriptor_set.c
>     @@ -704,6 +704,22 @@ anv_descriptor_set_write_buffer(struct
>     anv_descriptor_set *set,
>            bview->offset = buffer->offset + offset;
>            bview->range = anv_buffer_get_range(buffer, offset, range);
> 
>     +      /* Uniform and Storage buffers need to have surface size
>     +       * not less that the aligned 32-bit size of the buffer.
>     +       * To calculate the array lenght on unsized arrays
>     +       * in StorageBuffer the last 2 bits store the padding size
>     +       * added to the surface, so we can calculate latter the original
>     +       * buffer size to know the number of elements.
>     +       *
>     +       *  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>     +       *
>     +       *  array_size = (surface_size & ~3) - (surface_size & 3)
>     +       */
>     +      if (type == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER)
>     +         bview->range = 2 * align_u64(bview->range, 4) - bview->range;
>     +      else if (type == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER)
>     +         bview->range = align_u64(bview->range, 4);

> We chatted on another e-mail about doing this in ISL instead of here but
> you mentioned issues with failing GL tests.  Did you ever get to the
> bottom of those?  

Yes I've already implemented that way but it was originally creating 10
regressions in Jenkins on DEQP in tests like
dEQP-GLES31.functional.texture.texture_buffer.modify.bufferdata.buffer_size_131071

Although it was not possible to reproduce it locally (it was a kind of
poltergeist) it was consistent as I was messing with the texture size of
the buffer.

So the solution was add extra restrictions to the size alignment padding
to detect UniformBuffers only when next 3 conditions matched:
- isl_format was ISL_FORMAT_R32G32B32A32_FLOAT
- stride = 1
- size % 4 != 0

But when working testing this solution I realized that for Storage
Buffers there was a another entry point where I couldn't differentiate
from an Storage Buffer at anv_image.c  for buffer usage of
VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT. at is uses stride = 1 and
ISL_RAW format.

So in order to avoid messing with the sizes in that case I though it was
better to move the padding decision to the place I could differentiate
it correctly. Maybe it was an unfounded doubt and it wouldn't mind to
modify the size in that VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT case.

If so, what was the conclusion?

So after re-thinking aloud, and if my doubts with
VK_BUFFER_USAGE_STORAGE_TEXEL_BUFFER_BIT is unfounded I think I'll
recover ISL approach patch, as this would be needed any case for mediump
support.

>     +
>            /* If we're writing descriptors through a push command, we
>     need to
>             * allocate the surface state from the command buffer.
>     Otherwise it will
>             * be allocated by the descriptor pool when calling
>     diff --git a/src/intel/vulkan/anv_device.c
>     b/src/intel/vulkan/anv_device.c
>     index a83b7a39f6..cedeed5621 100644
>     --- a/src/intel/vulkan/anv_device.c
>     +++ b/src/intel/vulkan/anv_device.c
>     @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
> 
>         pMemoryRequirements->size = buffer->size;
>         pMemoryRequirements->alignment = alignment;
>     +
>     +   /* Storage and Uniform buffers should have their size aligned to
>     +    * 32-bits to avoid boundary checks when last DWord is not complete.
>     +    * This would ensure that not internal padding would be needed for
>     +    * 16-bit types.
>     +    */
>     +   if (device->robust_buffer_access &&
>     +       (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
>     +        buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
>     +      pMemoryRequirements->size = align_u64(buffer->size, 4);
>     +
>         pMemoryRequirements->memoryTypeBits = memory_types;
>      }
> 
>     --
>     2.14.3
> 
>     _______________________________________________
>     mesa-dev mailing list
>     mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> 
>