[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)
Jason Ekstrand
jason at jlekstrand.net
Mon Feb 26 19:22:21 UTC 2018
On Mon, Feb 26, 2018 at 10:09 AM, Chema Casanova <jmcasanova at igalia.com>
wrote:
> On 26/02/18 18:20, Jason Ekstrand wrote:
> > I've lost track of what's reviewed and what's not. Could you either
> > just send a status list or do a resend once all the current comments are
> > handled?
>
> Reviewed-by and all feedback addressed
> --------------------------------------
> [1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
> 32-bits.
> [3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
> [5/7] spirv: Calculate properly 16-bit vector sizes
> [6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit
> aligned
>
> Feedback received but pending to finish v2
> ------------------------------------------
> [2/7] i965/fs: Support 16-bit do_read_vector with
> VK_KHR_relaxed_block_layout
>
> To review when everything is ready
> ----------------------------------
> [4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO
> [7/7] anv: Enable VK_KHR_16bit_storage for PushConstant
>
> I'll resend this series when [2/7] is ready.
>
Sounds good. 4 and 7 will basically be auto-reviewed once everything else
is in shape.
--Jason
>
> Chema
>
> > --Jason
> >
> >
> > On February 26, 2018 09:08:01 Chema Casanova <jmcasanova at igalia.com>
> wrote:
> >
> >> On 26/02/18 16:54, Jason Ekstrand wrote:
> >>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
> >>> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
> >>>
> >>> The surfaces that backup the GPU buffers have a boundary check that
> >>> considers that access to partial dwords are considered
> >>> out-of-bounds.
> >>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and
> the
> >>> last two bytes would always be read as 0 or its writting ignored.
> >>>
> >>> The introduction of 16-bit types implies that we need to align
> >>> the size
> >>> to 4-bytes multiples so that partial dwords could be read/written.
> >>> Adding an inconditional +2 size to buffers not being multiple of 2
> >>> solves this issue for the general cases of UBO or SSBO.
> >>>
> >>> But, when unsized arrays of 16-bit elements are used it is not
> >>> possible
> >>> to know if the size was padded or not. To solve this issue the
> >>> implementation calculates the needed size of the buffer surfaces,
> >>> as suggested by Jason:
> >>>
> >>> surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
> >>
> >>
> >> Changed also the commit log according the the comments below.
> >>
> >>>
> >>> So when we calculate backwards the buffer_size in the backend we
> >>> update the resinfo return value with:
> >>>
> >>> buffer_size = (surface_size & ~3) - (surface_size & 3)
> >>>
> >>> It is also exposed this buffer requirements when robust buffer
> >>> access
> >>> is enabled so these buffer sizes recommend being multiple of 4.
> >>>
> >>> v2: (Jason Ekstrand)
> >>> Move padding logic fron anv to isl_surface_state
> >>> Move calculus of original size from spirv to driver backend
> >>> ---
> >>> src/intel/compiler/brw_fs_nir.cpp | 27
> ++++++++++++++++++++++++++-
> >>> src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++-
> >>> src/intel/vulkan/anv_device.c | 11 +++++++++++
> >>> 3 files changed, 57 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> >>> b/src/intel/compiler/brw_fs_nir.cpp
> >>> index 8efec34cc9d..d017af040b4 100644
> >>> --- a/src/intel/compiler/brw_fs_nir.cpp
> >>> +++ b/src/intel/compiler/brw_fs_nir.cpp
> >>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
> >>> fs_builder &bld, nir_intrinsic_instr *instr
> >>> inst->mlen = 1;
> >>> inst->size_written = 4 * REG_SIZE;
> >>>
> >>> - bld.MOV(retype(dest, ret_payload.type),
> >>> component(ret_payload, 0));
> >>> + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking
> and
> >>> Faulting:
> >>> + *
> >>> + * "Out-of-bounds checking is always performed at a DWord
> >>> granularity. If
> >>> + * any part of the DWord is out-of-bounds then the whole
> >>> DWord is
> >>> + * considered out-of-bounds."
> >>> + *
> >>> + * This implies that types with size smaller than 4-bytes
> >>> (16-bits) need
> >>>
> >>>
> >>> Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)".
> >>
> >> Completely agree.
> >>
> >>> + * to be padded if they don't complete the last dword of the
> >>> buffer. But
> >>> + * as we need to maintain the original size we need to
> >>> reverse the padding
> >>> + * calculation to return the correct size to know the
> number
> >>> of elements
> >>> + * of an unsized array. As we stored in the last two bits of
> >>> the size
> >>> + * of the buffer the needed padding we calculate here:
> >>> + *
> >>> + * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> >>> + */
> >>> +
> >>> + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> >>>
> >>>
> >>> I'd call this aligned4 because it's in units of bytes.
> >>
> >> Locally changed.
> >>
> >>> + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> >>> + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> >>> +
> >>> + ubld.AND(size_padding, component(ret_payload, 0),
> >>> brw_imm_ud(3));
> >>> + ubld.AND(size_aligned32, component(ret_payload, 0),
> >>> brw_imm_ud(~3));
> >>>
> >>>
> >>> You don't really need the component() here.
> >>
> >> Removed.
> >>
> >>> + ubld.ADD(buffer_size, size_aligned32, negate
> (size_padding));
> >>
> >> Removed space after negate
> >>
> >>> +
> >>> + bld.MOV(retype(dest, ret_payload.type),
> >>> component(buffer_size, 0));
> >>> +
> >>> brw_mark_surface_used(prog_data, index);
> >>> break;
> >>> }
> >>> diff --git a/src/intel/isl/isl_surface_state.c
> >>> b/src/intel/isl/isl_surface_state.c
> >>> index bfb27fa4a44..ddc9eb53c96 100644
> >>> --- a/src/intel/isl/isl_surface_state.c
> >>> +++ b/src/intel/isl/isl_surface_state.c
> >>> @@ -673,7 +673,26 @@ void
> >>> isl_genX(buffer_fill_state_s)(void *state,
> >>> const struct
> >>> isl_buffer_fill_state_info *restrict info)
> >>> {
> >>> - uint32_t num_elements = info->size / info->stride;
> >>> + uint64_t buffer_size = info->size;
> >>> +
> >>> + /* Uniform and Storage buffers need to have surface size not
> >>> less that the
> >>> + * aligned 32-bit size of the buffer. To calculate the array
> >>> lenght on
> >>> + * unsized arrays in StorageBuffer the last 2 bits store the
> >>> padding size
> >>> + * added to the surface, so we can calculate latter the
> original
> >>> buffer
> >>> + * size to know the number of elements.
> >>> + *
> >>> + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
> >>>
> >>>
> >>> isl_align
> >>
> >> Changed also the formula to match the final math:
> >>
> >> * surface_size = isl_align(buffer_size, 4) +
> >> * (isl_align(buffer_size, 4) - buffer_size)
> >>
> >>
> >>> + *
> >>> + * array_size = (surface_size & ~3) - (surface_size & 3)
> >>
> >> Renamed array_size for buffer_size.
> >>
> >>
> >>> + */
> >>> + if (buffer_size % 4 &&
> >>>
> >>>
> >>> You don't need this bit as the below calculation will be a no-op in
> that
> >>> case.
> >>
> >> Removed
> >>
> >>
> >>> + (info->format == ISL_FORMAT_RAW ||
> >>> + info->stride < isl_format_get_layout(info->format)->bpb
> >>> / 8)) {
> >>> + assert(info->stride == 1);
> >>> + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
> >>>
> >>>
> >>> I think this would better match the FS calculation if it were:
> >>>
> >>> uint64_t aligned_size = isl_align(buffer_size, 4);
> >>> buffer_size = aligned_size + (aligned_size - buffer_size);
> >>>
> >>> The current calculation is probably fine though.
> >>
> >>
> >> Changed, this one has also less problem with overflow of the uint64 :)
> >>
> >>
> >>> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> >>
> >> Thanks for the review.
> >>
> >> Chema
> >>
> >>
> >>>
> >>>
> >>> + }
> >>> +
> >>> + uint32_t num_elements = buffer_size / info->stride;
> >>>
> >>> if (GEN_GEN >= 7) {
> >>> /* From the IVB PRM, SURFACE_STATE::Height,
> >>> diff --git a/src/intel/vulkan/anv_device.c
> >>> b/src/intel/vulkan/anv_device.c
> >>> index a83b7a39f6a..cedeed56219 100644
> >>> --- a/src/intel/vulkan/anv_device.c
> >>> +++ b/src/intel/vulkan/anv_device.c
> >>> @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
> >>>
> >>> pMemoryRequirements->size = buffer->size;
> >>> pMemoryRequirements->alignment = alignment;
> >>> +
> >>> + /* Storage and Uniform buffers should have their size aligned
> to
> >>> + * 32-bits to avoid boundary checks when last DWord is not
> >>> complete.
> >>> + * This would ensure that not internal padding would be
> >>> needed for
> >>> + * 16-bit types.
> >>> + */
> >>> + if (device->robust_buffer_access &&
> >>> + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
> >>> + buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
> >>> + pMemoryRequirements->size = align_u64(buffer->size, 4);
> >>> +
> >>> pMemoryRequirements->memoryTypeBits = memory_types;
> >>> }
> >>>
> >>> --
> >>> 2.14.3
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> mesa-dev mailing list
> >>> mesa-dev at lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>>
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180226/08fa5e60/attachment-0001.html>
More information about the mesa-dev
mailing list