<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo <span dir="ltr"><<a href="mailto:jmcasanova@igalia.com" target="_blank">jmcasanova@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The surfaces that backup the GPU buffers have a boundary check that<br> considers that access to partial dwords are considered out-of-bounds.<br> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the<br> last two bytes would always be read as 0 or its writting ignored.<br> <br> The introduction of 16-bit types implies that we need to align the size<br> to 4-bytes multiples so that partial dwords could be read/written.<br> Adding an inconditional +2 size to buffers not being multiple of 2<br> solves this issue for the general cases of UBO or SSBO.<br> <br> But, when unsized arrays of 16-bit elements are used it is not possible<br> to know if the size was padded or not. To solve this issue the<br> implementation calculates the needed size of the buffer surfaces,<br> as suggested by Jason:<br> <br> surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size<br> <br> So when we calculate backwards the buffer_size in the backend we<br> update the resinfo return value with:<br> <br> buffer_size = (surface_size & ~3) - (surface_size & 3)<br> <br> It is also exposed this buffer requirements when robust buffer access<br> is enabled so these buffer sizes recommend being multiple of 4.<br> <br> v2: (Jason Ekstrand)<br> Move padding logic fron anv to isl_surface_state<br> Move calculus of original size from spirv to driver backend<br> ---<br> src/intel/compiler/brw_fs_nir.<wbr>cpp | 27 ++++++++++++++++++++++++++-<br> src/intel/isl/isl_surface_<wbr>state.c | 21 ++++++++++++++++++++-<br> src/intel/vulkan/anv_device.c | 11 +++++++++++<br> 3 files changed, 57 insertions(+), 2 deletions(-)<br> <br> diff --git a/src/intel/compiler/brw_fs_<wbr>nir.cpp b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br> index 8efec34cc9d..d017af040b4 100644<br> --- a/src/intel/compiler/brw_fs_<wbr>nir.cpp<br> +++ b/src/intel/compiler/brw_fs_<wbr>nir.cpp<br> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_<wbr>intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr<br> inst->mlen = 1;<br> inst->size_written = 4 * REG_SIZE;<br> <br> - bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));<br> + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:<br> + *<br> + * "Out-of-bounds checking is always performed at a DWord granularity. If<br> + * any part of the DWord is out-of-bounds then the whole DWord is<br> + * considered out-of-bounds."<br> + *<br> + * This implies that types with size smaller than 4-bytes (16-bits) need<br></blockquote><div><br></div><div>Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)".<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + * to be padded if they don't complete the last dword of the buffer. But<br> + * as we need to maintain the original size we need to reverse the padding<br> + * calculation to return the correct size to know the number of elements<br> + * of an unsized array. As we stored in the last two bits of the size<br> + * of the buffer the needed padding we calculate here:<br> + *<br> + * buffer_size = resinfo_size & ~3 - resinfo_size & 3<br> + */<br> +<br> + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_<wbr>UD);<br></blockquote><div><br></div><div>I'd call this aligned4 because it's in units of bytes.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_<wbr>UD);<br> + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_<wbr>UD);<br> +<br> + ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));<br> + ubld.AND(size_aligned32, component(ret_payload, 0), brw_imm_ud(~3));<br></blockquote><div><br></div><div>You don't really need the component() here.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + ubld.ADD(buffer_size, size_aligned32, negate (size_padding));<br> +<br> + bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));<br> +<br> brw_mark_surface_used(prog_<wbr>data, index);<br> break;<br> }<br> diff --git a/src/intel/isl/isl_surface_<wbr>state.c b/src/intel/isl/isl_surface_<wbr>state.c<br> index bfb27fa4a44..ddc9eb53c96 100644<br> --- a/src/intel/isl/isl_surface_<wbr>state.c<br> +++ b/src/intel/isl/isl_surface_<wbr>state.c<br> @@ -673,7 +673,26 @@ void<br> isl_genX(buffer_fill_state_s)(<wbr>void *state,<br> const struct isl_buffer_fill_state_info *restrict info)<br> {<br> - uint32_t num_elements = info->size / info->stride;<br> + uint64_t buffer_size = info->size;<br> +<br> + /* Uniform and Storage buffers need to have surface size not less that the<br> + * aligned 32-bit size of the buffer. To calculate the array lenght on<br> + * unsized arrays in StorageBuffer the last 2 bits store the padding size<br> + * added to the surface, so we can calculate latter the original buffer<br> + * size to know the number of elements.<br> + *<br> + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size<br></blockquote><div><br></div><div>isl_align<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + *<br> + * array_size = (surface_size & ~3) - (surface_size & 3)<br> + */<br> + if (buffer_size % 4 &&<br></blockquote><div><br></div><div>You don't need this bit as the below calculation will be a no-op in that case.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + (info->format == ISL_FORMAT_RAW ||<br> + info->stride < isl_format_get_layout(info-><wbr>format)->bpb / 8)) {<br> + assert(info->stride == 1);<br> + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;<br></blockquote><div><br></div><div>I think this would better match the FS calculation if it were:<br><br></div><div>uint64_t aligned_size = isl_align(buffer_size, 4);<br></div><div>buffer_size = aligned_size + (aligned_size - buffer_size);<br><br></div><div>The current calculation is probably fine though.<br><br></div><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> + }<br> +<br> + uint32_t num_elements = buffer_size / info->stride;<br> <br> if (GEN_GEN >= 7) {<br> /* From the IVB PRM, SURFACE_STATE::Height,<br> diff --git a/src/intel/vulkan/anv_device.<wbr>c b/src/intel/vulkan/anv_device.<wbr>c<br> index a83b7a39f6a..cedeed56219 100644<br> --- a/src/intel/vulkan/anv_device.<wbr>c<br> +++ b/src/intel/vulkan/anv_device.<wbr>c<br> @@ -2103,6 +2103,17 @@ void anv_<wbr>GetBufferMemoryRequirements(<br> <br> pMemoryRequirements->size = buffer->size;<br> pMemoryRequirements->alignment = alignment;<br> +<br> + /* Storage and Uniform buffers should have their size aligned to<br> + * 32-bits to avoid boundary checks when last DWord is not complete.<br> + * This would ensure that not internal padding would be needed for<br> + * 16-bit types.<br> + */<br> + if (device->robust_buffer_access &&<br> + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_<wbr>BUFFER_BIT ||<br> + buffer->usage & VK_BUFFER_USAGE_STORAGE_<wbr>BUFFER_BIT))<br> + pMemoryRequirements->size = align_u64(buffer->size, 4);<br> +<br> pMemoryRequirements-><wbr>memoryTypeBits = memory_types;<br> }<br> <span class="HOEnZb"><font color="#888888"><br> --<br> 2.14.3<br> <br> </font></span></blockquote></div><br></div></div>