[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

Mon Feb 26 15:54:14 UTC 2018

On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo <
jmcasanova at igalia.com> wrote:

> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
>
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytes multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
>
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
>
> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
>
> buffer_size = (surface_size & ~3) - (surface_size & 3)
>
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
>
> v2: (Jason Ekstrand)
>     Move padding logic fron anv to isl_surface_state
>     Move calculus of original size from spirv to driver backend
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++-
>  src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++-
>  src/intel/vulkan/anv_device.c     | 11 +++++++++++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..d017af040b4 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
> &bld, nir_intrinsic_instr *instr
>        inst->mlen = 1;
>        inst->size_written = 4 * REG_SIZE;
>
> -      bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
> +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> Faulting:
> +       *
> +       * "Out-of-bounds checking is always performed at a DWord
> granularity. If
> +       * any part of the DWord is out-of-bounds then the whole DWord is
> +       * considered out-of-bounds."
> +       *
> +       * This implies that types with size smaller than 4-bytes (16-bits)
> need
>

Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".

> +       * to be padded if they don't complete the last dword of the
> buffer. But
> +       * as we need to maintain the original size we need to reverse the
> padding
> +       * calculation to return the correct size to know the  number of
> elements
> +       * of an unsized array. As we stored in the last two bits of the
> size
> +       * of the buffer the needed padding we calculate here:
> +       *
> +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> +       */
> +
> +      fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>

I'd call this aligned4 because it's in units of bytes.

> +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +      ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
> +      ubld.AND(size_aligned32, component(ret_payload, 0), brw_imm_ud(~3));
>

You don't really need the component() here.

> +      ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
> +
> +      bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
> +
>        brw_mark_surface_used(prog_data, index);
>        break;
>     }
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index bfb27fa4a44..ddc9eb53c96 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -673,7 +673,26 @@ void
>  isl_genX(buffer_fill_state_s)(void *state,
>                                const struct isl_buffer_fill_state_info
> *restrict info)
>  {
> -   uint32_t num_elements = info->size / info->stride;
> +   uint64_t buffer_size = info->size;
> +
> +   /* Uniform and Storage buffers need to have surface size not less that
> the
> +    * aligned 32-bit size of the buffer. To calculate the array lenght on
> +    * unsized arrays in StorageBuffer the last 2 bits store the padding
> size
> +    * added to the surface, so we can calculate latter the original buffer
> +    * size to know the number of elements.
> +    *
> +    *  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>

isl_align

> +    *
> +    *  array_size = (surface_size & ~3) - (surface_size & 3)
> +    */
> +   if (buffer_size % 4 &&
>

You don't need this bit as the below calculation will be a no-op in that
case.

> +       (info->format == ISL_FORMAT_RAW  ||
> +        info->stride < isl_format_get_layout(info->format)->bpb / 8)) {
> +      assert(info->stride == 1);
> +      buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
>

I think this would better match the FS calculation if it were:

uint64_t aligned_size = isl_align(buffer_size, 4);
buffer_size = aligned_size + (aligned_size - buffer_size);

The current calculation is probably fine though.

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

> +   }
> +
> +   uint32_t num_elements = buffer_size / info->stride;
>
>     if (GEN_GEN >= 7) {
>        /* From the IVB PRM, SURFACE_STATE::Height,
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index a83b7a39f6a..cedeed56219 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
>
>     pMemoryRequirements->size = buffer->size;
>     pMemoryRequirements->alignment = alignment;
> +
> +   /* Storage and Uniform buffers should have their size aligned to
> +    * 32-bits to avoid boundary checks when last DWord is not complete.
> +    * This would ensure that not internal padding would be needed for
> +    * 16-bit types.
> +    */
> +   if (device->robust_buffer_access &&
> +       (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
> +        buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
> +      pMemoryRequirements->size = align_u64(buffer->size, 4);
> +
>     pMemoryRequirements->memoryTypeBits = memory_types;
>  }
>
> --
> 2.14.3
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180226/b80572c3/attachment-0001.html>