[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)
Chema Casanova
jmcasanova at igalia.com
Mon Feb 26 18:09:49 UTC 2018
On 26/02/18 18:20, Jason Ekstrand wrote:
> I've lost track of what's reviewed and what's not. Could you either
> just send a status list or do a resend once all the current comments are
> handled?
Reviewed-by and all feedback addressed
--------------------------------------
[1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
32-bits.
[3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
[5/7] spirv: Calculate properly 16-bit vector sizes
[6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit
aligned
Feedback received but pending to finish v2
------------------------------------------
[2/7] i965/fs: Support 16-bit do_read_vector with
VK_KHR_relaxed_block_layout
To review when everything is ready
----------------------------------
[4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO
[7/7] anv: Enable VK_KHR_16bit_storage for PushConstant
I'll resend this series when [2/7] is ready.
Chema
> --Jason
>
>
> On February 26, 2018 09:08:01 Chema Casanova <jmcasanova at igalia.com> wrote:
>
>> On 26/02/18 16:54, Jason Ekstrand wrote:
>>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
>>> <jmcasanova at igalia.com <mailto:jmcasanova at igalia.com>> wrote:
>>>
>>> The surfaces that backup the GPU buffers have a boundary check that
>>> considers that access to partial dwords are considered
>>> out-of-bounds.
>>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
>>> last two bytes would always be read as 0 or its writting ignored.
>>>
>>> The introduction of 16-bit types implies that we need to align
>>> the size
>>> to 4-bytes multiples so that partial dwords could be read/written.
>>> Adding an inconditional +2 size to buffers not being multiple of 2
>>> solves this issue for the general cases of UBO or SSBO.
>>>
>>> But, when unsized arrays of 16-bit elements are used it is not
>>> possible
>>> to know if the size was padded or not. To solve this issue the
>>> implementation calculates the needed size of the buffer surfaces,
>>> as suggested by Jason:
>>>
>>> surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
>>
>>
>> Changed also the commit log according the the comments below.
>>
>>>
>>> So when we calculate backwards the buffer_size in the backend we
>>> update the resinfo return value with:
>>>
>>> buffer_size = (surface_size & ~3) - (surface_size & 3)
>>>
>>> It is also exposed this buffer requirements when robust buffer
>>> access
>>> is enabled so these buffer sizes recommend being multiple of 4.
>>>
>>> v2: (Jason Ekstrand)
>>> Move padding logic fron anv to isl_surface_state
>>> Move calculus of original size from spirv to driver backend
>>> ---
>>> src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++-
>>> src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++-
>>> src/intel/vulkan/anv_device.c | 11 +++++++++++
>>> 3 files changed, 57 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
>>> b/src/intel/compiler/brw_fs_nir.cpp
>>> index 8efec34cc9d..d017af040b4 100644
>>> --- a/src/intel/compiler/brw_fs_nir.cpp
>>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
>>> fs_builder &bld, nir_intrinsic_instr *instr
>>> inst->mlen = 1;
>>> inst->size_written = 4 * REG_SIZE;
>>>
>>> - bld.MOV(retype(dest, ret_payload.type),
>>> component(ret_payload, 0));
>>> + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
>>> Faulting:
>>> + *
>>> + * "Out-of-bounds checking is always performed at a DWord
>>> granularity. If
>>> + * any part of the DWord is out-of-bounds then the whole
>>> DWord is
>>> + * considered out-of-bounds."
>>> + *
>>> + * This implies that types with size smaller than 4-bytes
>>> (16-bits) need
>>>
>>>
>>> Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)".
>>
>> Completely agree.
>>
>>> + * to be padded if they don't complete the last dword of the
>>> buffer. But
>>> + * as we need to maintain the original size we need to
>>> reverse the padding
>>> + * calculation to return the correct size to know the number
>>> of elements
>>> + * of an unsized array. As we stored in the last two bits of
>>> the size
>>> + * of the buffer the needed padding we calculate here:
>>> + *
>>> + * buffer_size = resinfo_size & ~3 - resinfo_size & 3
>>> + */
>>> +
>>> + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>>
>>>
>>> I'd call this aligned4 because it's in units of bytes.
>>
>> Locally changed.
>>
>>> + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>> + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>> +
>>> + ubld.AND(size_padding, component(ret_payload, 0),
>>> brw_imm_ud(3));
>>> + ubld.AND(size_aligned32, component(ret_payload, 0),
>>> brw_imm_ud(~3));
>>>
>>>
>>> You don't really need the component() here.
>>
>> Removed.
>>
>>> + ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
>>
>> Removed space after negate
>>
>>> +
>>> + bld.MOV(retype(dest, ret_payload.type),
>>> component(buffer_size, 0));
>>> +
>>> brw_mark_surface_used(prog_data, index);
>>> break;
>>> }
>>> diff --git a/src/intel/isl/isl_surface_state.c
>>> b/src/intel/isl/isl_surface_state.c
>>> index bfb27fa4a44..ddc9eb53c96 100644
>>> --- a/src/intel/isl/isl_surface_state.c
>>> +++ b/src/intel/isl/isl_surface_state.c
>>> @@ -673,7 +673,26 @@ void
>>> isl_genX(buffer_fill_state_s)(void *state,
>>> const struct
>>> isl_buffer_fill_state_info *restrict info)
>>> {
>>> - uint32_t num_elements = info->size / info->stride;
>>> + uint64_t buffer_size = info->size;
>>> +
>>> + /* Uniform and Storage buffers need to have surface size not
>>> less that the
>>> + * aligned 32-bit size of the buffer. To calculate the array
>>> lenght on
>>> + * unsized arrays in StorageBuffer the last 2 bits store the
>>> padding size
>>> + * added to the surface, so we can calculate latter the original
>>> buffer
>>> + * size to know the number of elements.
>>> + *
>>> + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
>>>
>>>
>>> isl_align
>>
>> Changed also the formula to match the final math:
>>
>> * surface_size = isl_align(buffer_size, 4) +
>> * (isl_align(buffer_size, 4) - buffer_size)
>>
>>
>>> + *
>>> + * array_size = (surface_size & ~3) - (surface_size & 3)
>>
>> Renamed array_size for buffer_size.
>>
>>
>>> + */
>>> + if (buffer_size % 4 &&
>>>
>>>
>>> You don't need this bit as the below calculation will be a no-op in that
>>> case.
>>
>> Removed
>>
>>
>>> + (info->format == ISL_FORMAT_RAW ||
>>> + info->stride < isl_format_get_layout(info->format)->bpb
>>> / 8)) {
>>> + assert(info->stride == 1);
>>> + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
>>>
>>>
>>> I think this would better match the FS calculation if it were:
>>>
>>> uint64_t aligned_size = isl_align(buffer_size, 4);
>>> buffer_size = aligned_size + (aligned_size - buffer_size);
>>>
>>> The current calculation is probably fine though.
>>
>>
>> Changed, this one has also less problem with overflow of the uint64 :)
>>
>>
>>> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
>>
>> Thanks for the review.
>>
>> Chema
>>
>>
>>>
>>>
>>> + }
>>> +
>>> + uint32_t num_elements = buffer_size / info->stride;
>>>
>>> if (GEN_GEN >= 7) {
>>> /* From the IVB PRM, SURFACE_STATE::Height,
>>> diff --git a/src/intel/vulkan/anv_device.c
>>> b/src/intel/vulkan/anv_device.c
>>> index a83b7a39f6a..cedeed56219 100644
>>> --- a/src/intel/vulkan/anv_device.c
>>> +++ b/src/intel/vulkan/anv_device.c
>>> @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
>>>
>>> pMemoryRequirements->size = buffer->size;
>>> pMemoryRequirements->alignment = alignment;
>>> +
>>> + /* Storage and Uniform buffers should have their size aligned to
>>> + * 32-bits to avoid boundary checks when last DWord is not
>>> complete.
>>> + * This would ensure that not internal padding would be
>>> needed for
>>> + * 16-bit types.
>>> + */
>>> + if (device->robust_buffer_access &&
>>> + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
>>> + buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
>>> + pMemoryRequirements->size = align_u64(buffer->size, 4);
>>> +
>>> pMemoryRequirements->memoryTypeBits = memory_types;
>>> }
>>>
>>> --
>>> 2.14.3
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>
>
>
More information about the mesa-dev
mailing list