[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)
Jason Ekstrand
jason at jlekstrand.net
Mon Feb 26 17:20:05 UTC 2018
I've lost track of what's reviewed and what's not. Could you either just
send a status list or do a resend once all the current comments are handled?
--Jason
On February 26, 2018 09:08:01 Chema Casanova <jmcasanova at igalia.com> wrote:
> On 26/02/18 16:54, Jason Ekstrand wrote:
>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
>> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
>>
>> The surfaces that backup the GPU buffers have a boundary check that
>> considers that access to partial dwords are considered out-of-bounds.
>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
>> last two bytes would always be read as 0 or its writting ignored.
>>
>> The introduction of 16-bit types implies that we need to align the size
>> to 4-bytes multiples so that partial dwords could be read/written.
>> Adding an inconditional +2 size to buffers not being multiple of 2
>> solves this issue for the general cases of UBO or SSBO.
>>
>> But, when unsized arrays of 16-bit elements are used it is not possible
>> to know if the size was padded or not. To solve this issue the
>> implementation calculates the needed size of the buffer surfaces,
>> as suggested by Jason:
>>
>> surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
>
>
> Changed also the commit log according the the comments below.
>
>>
>> So when we calculate backwards the buffer_size in the backend we
>> update the resinfo return value with:
>>
>> buffer_size = (surface_size & ~3) - (surface_size & 3)
>>
>> It is also exposed this buffer requirements when robust buffer access
>> is enabled so these buffer sizes recommend being multiple of 4.
>>
>> v2: (Jason Ekstrand)
>> Move padding logic fron anv to isl_surface_state
>> Move calculus of original size from spirv to driver backend
>> ---
>> src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++-
>> src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++-
>> src/intel/vulkan/anv_device.c | 11 +++++++++++
>> 3 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
>> b/src/intel/compiler/brw_fs_nir.cpp
>> index 8efec34cc9d..d017af040b4 100644
>> --- a/src/intel/compiler/brw_fs_nir.cpp
>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
>> fs_builder &bld, nir_intrinsic_instr *instr
>> inst->mlen = 1;
>> inst->size_written = 4 * REG_SIZE;
>>
>> - bld.MOV(retype(dest, ret_payload.type),
>> component(ret_payload, 0));
>> + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
>> Faulting:
>> + *
>> + * "Out-of-bounds checking is always performed at a DWord
>> granularity. If
>> + * any part of the DWord is out-of-bounds then the whole DWord is
>> + * considered out-of-bounds."
>> + *
>> + * This implies that types with size smaller than 4-bytes
>> (16-bits) need
>>
>>
>> Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)".
>
> Completely agree.
>
>> + * to be padded if they don't complete the last dword of the
>> buffer. But
>> + * as we need to maintain the original size we need to
>> reverse the padding
>> + * calculation to return the correct size to know the number
>> of elements
>> + * of an unsized array. As we stored in the last two bits of
>> the size
>> + * of the buffer the needed padding we calculate here:
>> + *
>> + * buffer_size = resinfo_size & ~3 - resinfo_size & 3
>> + */
>> +
>> + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>
>>
>> I'd call this aligned4 because it's in units of bytes.
>
> Locally changed.
>
>> + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>> + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>> +
>> + ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
>> + ubld.AND(size_aligned32, component(ret_payload, 0),
>> brw_imm_ud(~3));
>>
>>
>> You don't really need the component() here.
>
> Removed.
>
>> + ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
>
> Removed space after negate
>
>> +
>> + bld.MOV(retype(dest, ret_payload.type),
>> component(buffer_size, 0));
>> +
>> brw_mark_surface_used(prog_data, index);
>> break;
>> }
>> diff --git a/src/intel/isl/isl_surface_state.c
>> b/src/intel/isl/isl_surface_state.c
>> index bfb27fa4a44..ddc9eb53c96 100644
>> --- a/src/intel/isl/isl_surface_state.c
>> +++ b/src/intel/isl/isl_surface_state.c
>> @@ -673,7 +673,26 @@ void
>> isl_genX(buffer_fill_state_s)(void *state,
>> const struct
>> isl_buffer_fill_state_info *restrict info)
>> {
>> - uint32_t num_elements = info->size / info->stride;
>> + uint64_t buffer_size = info->size;
>> +
>> + /* Uniform and Storage buffers need to have surface size not
>> less that the
>> + * aligned 32-bit size of the buffer. To calculate the array
>> lenght on
>> + * unsized arrays in StorageBuffer the last 2 bits store the
>> padding size
>> + * added to the surface, so we can calculate latter the original
>> buffer
>> + * size to know the number of elements.
>> + *
>> + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
>>
>>
>> isl_align
>
> Changed also the formula to match the final math:
>
> * surface_size = isl_align(buffer_size, 4) +
> * (isl_align(buffer_size, 4) - buffer_size)
>
>
>> + *
>> + * array_size = (surface_size & ~3) - (surface_size & 3)
>
> Renamed array_size for buffer_size.
>
>
>> + */
>> + if (buffer_size % 4 &&
>>
>>
>> You don't need this bit as the below calculation will be a no-op in that
>> case.
>
> Removed
>
>
>> + (info->format == ISL_FORMAT_RAW ||
>> + info->stride < isl_format_get_layout(info->format)->bpb / 8)) {
>> + assert(info->stride == 1);
>> + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
>>
>>
>> I think this would better match the FS calculation if it were:
>>
>> uint64_t aligned_size = isl_align(buffer_size, 4);
>> buffer_size = aligned_size + (aligned_size - buffer_size);
>>
>> The current calculation is probably fine though.
>
>
> Changed, this one has also less problem with overflow of the uint64 :)
>
>
>> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
>
> Thanks for the review.
>
> Chema
>
>
>>
>>
>> + }
>> +
>> + uint32_t num_elements = buffer_size / info->stride;
>>
>> if (GEN_GEN >= 7) {
>> /* From the IVB PRM, SURFACE_STATE::Height,
>> diff --git a/src/intel/vulkan/anv_device.c
>> b/src/intel/vulkan/anv_device.c
>> index a83b7a39f6a..cedeed56219 100644
>> --- a/src/intel/vulkan/anv_device.c
>> +++ b/src/intel/vulkan/anv_device.c
>> @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
>>
>> pMemoryRequirements->size = buffer->size;
>> pMemoryRequirements->alignment = alignment;
>> +
>> + /* Storage and Uniform buffers should have their size aligned to
>> + * 32-bits to avoid boundary checks when last DWord is not complete.
>> + * This would ensure that not internal padding would be needed for
>> + * 16-bit types.
>> + */
>> + if (device->robust_buffer_access &&
>> + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
>> + buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
>> + pMemoryRequirements->size = align_u64(buffer->size, 4);
>> +
>> pMemoryRequirements->memoryTypeBits = memory_types;
>> }
>>
>> --
>> 2.14.3
>>
>>
>>
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
More information about the mesa-dev
mailing list