<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">After talking with Kristian, this seems like a reasonable direction to go so<br></div><div class="gmail_quote"><br>s/RFC/PATCH/<br><br></div><div class="gmail_quote">Reviews welcome.<br></div><div class="gmail_quote"><br>On Sat, Mar 4, 2017 at 12:19 PM, Kristian H. Kristensen <span dir="ltr"><<a href="mailto:krh@bitplanet.net" target="_blank">krh@bitplanet.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
<br>
> We have a performance problem with dynamic buffer descriptors. Because<br>
> we are currently implementing them by pushing an offset into the shader<br>
> and adding that offset onto the already existing offset for the UBO/SSBO<br>
> operation, all UBO/SSBO operations on dynamic descriptors are indirect.<br>
> The back-end compiler implements indirect pull constant loads using what<br>
> basically amounts to a texelFetch instruction. For pull constant loads<br>
> with constant offsets, however, we use an oword block read message which<br>
> goes through the constant cache and reads a whole cache line at a time.<br>
> Because of these two things, direct pull constant loads are much faster<br>
> than indirect pull constant loads. Because all loads from dynamically<br>
> bound buffers are indirect, the user takes a substantial performance<br>
> penalty when using this "performance" feature.<br>
><br>
> There are two potential solutions I have seen for this problem. The<br>
> alternate solution is to continue pushing offsets into the shader but<br>
> wire things up in the back-end compiler so that we use the oword block<br>
> read messages anyway. The only reason we can do this because we know a<br>
> priori that the dynamic offsets are uniform and 16-byte aligned.<br>
> Unfortunately, thanks to the 16-byte alignment requirement of the oword<br>
> messages, we can't do some general "if the indirect offset is uniform,<br>
> use an oword message" sort of thing.<br>
><br>
> This solution, however, is recommended for a few of reasons:<br>
><br>
> 1. Surface states are relatively cheap. We've been using on-the-fly<br>
> surface state setup for some time in GL and it works well. Also,<br>
> dynamic offsets with on-the-fly surface state should still be<br>
> cheaper than allocating new descriptor sets every time you want to<br>
> change a buffer offset which is really the only requirement of the<br>
> dynamic offsets feature.<br>
><br>
> 2. This requires substantially less compiler plumbing. Not only can we<br>
> delete the entire apply_dynamic_offsets pass but we can also avoid<br>
> having to add architecture for passing dynamic offsets to the back-<br>
> end compiler in such a way that it can continue using oword messages.<br>
><br>
> 3. We get robust buffer access range-checking for free. Because the<br>
> offset and range are baked into the surface state, we no longer need<br>
> to pass ranges around and do bounds-checking in the shader.<br>
><br>
> 4. Once we finally get UBO pushing implemented, it will be much easier<br>
> to handle pushing chunks of dynamic descriptors if the compiler<br>
> remains blissfully unaware of dynamic descriptors.<br>
><br>
> This commit improves performance of The Talos Principle on ULTRA<br>
> settings by around 50% and brings it nicely into line with OpenGL<br>
> performance.<br>
<br>
</div></div>Does the uniform analysis pass and the oword read result in a similar<br>
improvement?</blockquote><div><br></div><div>Yes it does.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I think both approaches are fine, but you might want to<br>
keep the uniform pass around - there's a lot of URB reads and writes in<br>
GS/HS/DS that are dynamically uniform but end up using per-slot offsets<br>
unconditionally.<br>
<span class="HOEnZb"><font color="#888888"><br>
Kristian<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> Cc: Kristian Høgsberg <<a href="mailto:krh@bitplanet.net">krh@bitplanet.net</a>><br>
> ---<br>
> src/intel/vulkan/Makefile.<wbr>sources | 1 -<br>
> src/intel/vulkan/anv_cmd_<wbr>buffer.c | 47 +++----<br>
> src/intel/vulkan/anv_<wbr>descriptor_set.c | 62 ++++----<br>
> src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c | 172 -----------------------<br>
> src/intel/vulkan/anv_pipeline.<wbr>c | 6 -<br>
> src/intel/vulkan/anv_private.h | 13 +-<br>
> src/intel/vulkan/genX_cmd_<wbr>buffer.c | 30 +++-<br>
> 7 files changed, 86 insertions(+), 245 deletions(-)<br>
> delete mode 100644 src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
><br>
> diff --git a/src/intel/vulkan/Makefile.<wbr>sources b/src/intel/vulkan/Makefile.<wbr>sources<br>
> index fd149b2..24e2225 100644<br>
> --- a/src/intel/vulkan/Makefile.<wbr>sources<br>
> +++ b/src/intel/vulkan/Makefile.<wbr>sources<br>
> @@ -32,7 +32,6 @@ VULKAN_FILES := \<br>
> anv_image.c \<br>
> anv_intel.c \<br>
> anv_nir.h \<br>
> - anv_nir_apply_dynamic_offsets.<wbr>c \<br>
> anv_nir_apply_pipeline_layout.<wbr>c \<br>
> anv_nir_lower_input_<wbr>attachments.c \<br>
> anv_nir_lower_push_constants.c \<br>
> diff --git a/src/intel/vulkan/anv_cmd_<wbr>buffer.c b/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> index cab1dd7..a6ad48a 100644<br>
> --- a/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> @@ -507,42 +507,31 @@ void anv_CmdBindDescriptorSets(<br>
><br>
> assert(firstSet + descriptorSetCount < MAX_SETS);<br>
><br>
> + uint32_t dynamic_slot = 0;<br>
> for (uint32_t i = 0; i < descriptorSetCount; i++) {<br>
> ANV_FROM_HANDLE(anv_<wbr>descriptor_set, set, pDescriptorSets[i]);<br>
> set_layout = layout->set[firstSet + i].layout;<br>
><br>
> - if (cmd_buffer->state.<wbr>descriptors[firstSet + i] != set) {<br>
> - cmd_buffer->state.descriptors[<wbr>firstSet + i] = set;<br>
> - cmd_buffer->state.descriptors_<wbr>dirty |= set_layout->shader_stages;<br>
> - }<br>
> + cmd_buffer->state.descriptors[<wbr>firstSet + i] = set;<br>
><br>
> if (set_layout->dynamic_offset_<wbr>count > 0) {<br>
> - anv_foreach_stage(s, set_layout->shader_stages) {<br>
> - anv_cmd_buffer_ensure_push_<wbr>constant_field(cmd_buffer, s, dynamic);<br>
> -<br>
> - struct anv_push_constants *push =<br>
> - cmd_buffer->state.push_<wbr>constants[s];<br>
> -<br>
> - unsigned d = layout->set[firstSet + i].dynamic_offset_start;<br>
> - const uint32_t *offsets = pDynamicOffsets;<br>
> - struct anv_descriptor *desc = set->descriptors;<br>
> -<br>
> - for (unsigned b = 0; b < set_layout->binding_count; b++) {<br>
> - if (set_layout->binding[b].<wbr>dynamic_offset_index < 0)<br>
> - continue;<br>
> -<br>
> - unsigned array_size = set_layout->binding[b].array_<wbr>size;<br>
> - for (unsigned j = 0; j < array_size; j++) {<br>
> - push->dynamic[d].offset = *(offsets++);<br>
> - push->dynamic[d].range = (desc->buffer_view) ?<br>
> - desc->buffer_view->range : 0;<br>
> - desc++;<br>
> - d++;<br>
> - }<br>
> - }<br>
> - }<br>
> - cmd_buffer->state.push_<wbr>constants_dirty |= set_layout->shader_stages;<br>
> + uint32_t dynamic_offset_start =<br>
> + layout->set[firstSet + i].dynamic_offset_start;<br>
> +<br>
> + /* Assert that everything is in range */<br>
> + assert(dynamic_offset_start + set_layout->dynamic_offset_<wbr>count <=<br>
> + ARRAY_SIZE(cmd_buffer->state.<wbr>dynamic_offsets));<br>
> + assert(dynamic_slot + set_layout->dynamic_offset_<wbr>count <=<br>
> + dynamicOffsetCount);<br>
> +<br>
> + typed_memcpy(&cmd_buffer-><wbr>state.dynamic_offsets[dynamic_<wbr>offset_start],<br>
> + &pDynamicOffsets[dynamic_slot]<wbr>,<br>
> + set_layout->dynamic_offset_<wbr>count);<br>
> +<br>
> + dynamic_slot += set_layout->dynamic_offset_<wbr>count;<br>
> }<br>
> +<br>
> + cmd_buffer->state.descriptors_<wbr>dirty |= set_layout->shader_stages;<br>
> }<br>
> }<br>
><br>
> diff --git a/src/intel/vulkan/anv_<wbr>descriptor_set.c b/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> index 2a37d7d..175efdb 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> +++ b/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> @@ -662,35 +662,39 @@ anv_descriptor_set_write_<wbr>buffer(struct anv_descriptor_set *set,<br>
><br>
> assert(type == bind_layout->type);<br>
><br>
> - struct anv_buffer_view *bview =<br>
> - &set->buffer_views[bind_<wbr>layout->buffer_index + element];<br>
> -<br>
> - bview->format = anv_isl_format_for_descriptor_<wbr>type(type);<br>
> - bview->bo = buffer->bo;<br>
> - bview->offset = buffer->offset + offset;<br>
> -<br>
> - /* For buffers with dynamic offsets, we use the full possible range in the<br>
> - * surface state and do the actual range-checking in the shader.<br>
> - */<br>
> - if (bind_layout->dynamic_offset_<wbr>index >= 0)<br>
> - range = VK_WHOLE_SIZE;<br>
> - bview->range = anv_buffer_get_range(buffer, offset, range);<br>
> -<br>
> - /* If we're writing descriptors through a push command, we need to allocate<br>
> - * the surface state from the command buffer. Otherwise it will be<br>
> - * allocated by the descriptor pool when calling<br>
> - * vkAllocateDescriptorSets. */<br>
> - if (alloc_stream)<br>
> - bview->surface_state = anv_state_stream_alloc(alloc_<wbr>stream, 64, 64);<br>
> -<br>
> - anv_fill_buffer_surface_state(<wbr>device, bview->surface_state,<br>
> - bview->format,<br>
> - bview->offset, bview->range, 1);<br>
> -<br>
> - *desc = (struct anv_descriptor) {<br>
> - .type = type,<br>
> - .buffer_view = bview,<br>
> - };<br>
> + if (type == VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC ||<br>
> + type == VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC) {<br>
> + *desc = (struct anv_descriptor) {<br>
> + .type = type,<br>
> + .buffer = buffer,<br>
> + .offset = offset,<br>
> + .range = range,<br>
> + };<br>
> + } else {<br>
> + struct anv_buffer_view *bview =<br>
> + &set->buffer_views[bind_<wbr>layout->buffer_index + element];<br>
> +<br>
> + bview->format = anv_isl_format_for_descriptor_<wbr>type(type);<br>
> + bview->bo = buffer->bo;<br>
> + bview->offset = buffer->offset + offset;<br>
> + bview->range = anv_buffer_get_range(buffer, offset, range);<br>
> +<br>
> + /* If we're writing descriptors through a push command, we need to<br>
> + * allocate the surface state from the command buffer. Otherwise it will<br>
> + * be allocated by the descriptor pool when calling<br>
> + * vkAllocateDescriptorSets. */<br>
> + if (alloc_stream)<br>
> + bview->surface_state = anv_state_stream_alloc(alloc_<wbr>stream, 64, 64);<br>
> +<br>
> + anv_fill_buffer_surface_state(<wbr>device, bview->surface_state,<br>
> + bview->format,<br>
> + bview->offset, bview->range, 1);<br>
> +<br>
> + *desc = (struct anv_descriptor) {<br>
> + .type = type,<br>
> + .buffer_view = bview,<br>
> + };<br>
> + }<br>
> }<br>
><br>
> void anv_UpdateDescriptorSets(<br>
> diff --git a/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c b/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
> deleted file mode 100644<br>
> index 80ef8ee..0000000<br>
> --- a/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
> +++ /dev/null<br>
> @@ -1,172 +0,0 @@<br>
> -/*<br>
> - * Copyright © 2015 Intel Corporation<br>
> - *<br>
> - * Permission is hereby granted, free of charge, to any person obtaining a<br>
> - * copy of this software and associated documentation files (the "Software"),<br>
> - * to deal in the Software without restriction, including without limitation<br>
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br>
> - * and/or sell copies of the Software, and to permit persons to whom the<br>
> - * Software is furnished to do so, subject to the following conditions:<br>
> - *<br>
> - * The above copyright notice and this permission notice (including the next<br>
> - * paragraph) shall be included in all copies or substantial portions of the<br>
> - * Software.<br>
> - *<br>
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br>
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br>
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL<br>
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br>
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br>
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br>
> - * IN THE SOFTWARE.<br>
> - */<br>
> -<br>
> -#include "anv_nir.h"<br>
> -#include "nir/nir_builder.h"<br>
> -<br>
> -static void<br>
> -apply_dynamic_offsets_block(<wbr>nir_block *block, nir_builder *b,<br>
> - const struct anv_pipeline_layout *layout,<br>
> - bool add_bounds_checks,<br>
> - uint32_t indices_start)<br>
> -{<br>
> - struct anv_descriptor_set_layout *set_layout;<br>
> -<br>
> - nir_foreach_instr_safe(instr, block) {<br>
> - if (instr->type != nir_instr_type_intrinsic)<br>
> - continue;<br>
> -<br>
> - nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);<br>
> -<br>
> - unsigned block_idx_src;<br>
> - switch (intrin->intrinsic) {<br>
> - case nir_intrinsic_load_ubo:<br>
> - case nir_intrinsic_load_ssbo:<br>
> - block_idx_src = 0;<br>
> - break;<br>
> - case nir_intrinsic_store_ssbo:<br>
> - block_idx_src = 1;<br>
> - break;<br>
> - default:<br>
> - continue; /* the loop */<br>
> - }<br>
> -<br>
> - nir_instr *res_instr = intrin->src[block_idx_src].<wbr>ssa->parent_instr;<br>
> - assert(res_instr->type == nir_instr_type_intrinsic);<br>
> - nir_intrinsic_instr *res_intrin = nir_instr_as_intrinsic(res_<wbr>instr);<br>
> - assert(res_intrin->intrinsic == nir_intrinsic_vulkan_resource_<wbr>index);<br>
> -<br>
> - unsigned set = res_intrin->const_index[0];<br>
> - unsigned binding = res_intrin->const_index[1];<br>
> -<br>
> - set_layout = layout->set[set].layout;<br>
> - if (set_layout->binding[binding].<wbr>dynamic_offset_index < 0)<br>
> - continue;<br>
> -<br>
> - b->cursor = nir_before_instr(&intrin-><wbr>instr);<br>
> -<br>
> - /* First, we need to generate the uniform load for the buffer offset */<br>
> - uint32_t index = layout->set[set].dynamic_<wbr>offset_start +<br>
> - set_layout->binding[binding].<wbr>dynamic_offset_index;<br>
> - uint32_t array_size = set_layout->binding[binding].<wbr>array_size;<br>
> -<br>
> - nir_intrinsic_instr *offset_load =<br>
> - nir_intrinsic_instr_create(b-><wbr>shader, nir_intrinsic_load_uniform);<br>
> - offset_load->num_components = 2;<br>
> - nir_intrinsic_set_base(offset_<wbr>load, indices_start + index * 8);<br>
> - nir_intrinsic_set_range(<wbr>offset_load, array_size * 8);<br>
> - offset_load->src[0] = nir_src_for_ssa(nir_imul(b, res_intrin->src[0].ssa,<br>
> - nir_imm_int(b, 8)));<br>
> -<br>
> - nir_ssa_dest_init(&offset_<wbr>load->instr, &offset_load->dest, 2, 32, NULL);<br>
> - nir_builder_instr_insert(b, &offset_load->instr);<br>
> -<br>
> - nir_src *offset_src = nir_get_io_offset_src(intrin);<br>
> - nir_ssa_def *old_offset = nir_ssa_for_src(b, *offset_src, 1);<br>
> - nir_ssa_def *new_offset = nir_iadd(b, old_offset, &offset_load->dest.ssa);<br>
> - nir_instr_rewrite_src(&intrin-<wbr>>instr, offset_src,<br>
> - nir_src_for_ssa(new_offset));<br>
> -<br>
> - if (!add_bounds_checks)<br>
> - continue;<br>
> -<br>
> - /* In order to avoid out-of-bounds access, we predicate */<br>
> - nir_ssa_def *pred = nir_uge(b, nir_channel(b, &offset_load->dest.ssa, 1),<br>
> - old_offset);<br>
> - nir_if *if_stmt = nir_if_create(b->shader);<br>
> - if_stmt->condition = nir_src_for_ssa(pred);<br>
> - nir_cf_node_insert(b->cursor, &if_stmt->cf_node);<br>
> -<br>
> - nir_instr_remove(&intrin-><wbr>instr);<br>
> - nir_instr_insert_after_cf_<wbr>list(&if_stmt->then_list, &intrin->instr);<br>
> -<br>
> - if (intrin->intrinsic != nir_intrinsic_store_ssbo) {<br>
> - /* It's a load, we need a phi node */<br>
> - nir_phi_instr *phi = nir_phi_instr_create(b-><wbr>shader);<br>
> - nir_ssa_dest_init(&phi->instr, &phi->dest,<br>
> - intrin->num_components,<br>
> - intrin->dest.ssa.bit_size, NULL);<br>
> -<br>
> - nir_phi_src *src1 = ralloc(phi, nir_phi_src);<br>
> - struct exec_node *tnode = exec_list_get_tail(&if_stmt-><wbr>then_list);<br>
> - src1->pred = exec_node_data(nir_block, tnode, cf_node.node);<br>
> - src1->src = nir_src_for_ssa(&intrin->dest.<wbr>ssa);<br>
> - exec_list_push_tail(&phi-><wbr>srcs, &src1->node);<br>
> -<br>
> - b->cursor = nir_after_cf_list(&if_stmt-><wbr>else_list);<br>
> - nir_const_value zero_val = { .u32 = { 0, 0, 0, 0 } };<br>
> - nir_ssa_def *zero = nir_build_imm(b, intrin->num_components,<br>
> - intrin->dest.ssa.bit_size, zero_val);<br>
> -<br>
> - nir_phi_src *src2 = ralloc(phi, nir_phi_src);<br>
> - struct exec_node *enode = exec_list_get_tail(&if_stmt-><wbr>else_list);<br>
> - src2->pred = exec_node_data(nir_block, enode, cf_node.node);<br>
> - src2->src = nir_src_for_ssa(zero);<br>
> - exec_list_push_tail(&phi-><wbr>srcs, &src2->node);<br>
> -<br>
> - assert(intrin->dest.is_ssa);<br>
> - nir_ssa_def_rewrite_uses(&<wbr>intrin->dest.ssa,<br>
> - nir_src_for_ssa(&phi->dest.<wbr>ssa));<br>
> -<br>
> - nir_instr_insert_after_cf(&if_<wbr>stmt->cf_node, &phi->instr);<br>
> - }<br>
> - }<br>
> -}<br>
> -<br>
> -void<br>
> -anv_nir_apply_dynamic_<wbr>offsets(struct anv_pipeline *pipeline,<br>
> - nir_shader *shader,<br>
> - struct brw_stage_prog_data *prog_data)<br>
> -{<br>
> - const struct anv_pipeline_layout *layout = pipeline->layout;<br>
> - if (!layout || !layout->stage[shader->stage].<wbr>has_dynamic_offsets)<br>
> - return;<br>
> -<br>
> - const bool add_bounds_checks = pipeline->device->robust_<wbr>buffer_access;<br>
> -<br>
> - nir_foreach_function(function, shader) {<br>
> - if (!function->impl)<br>
> - continue;<br>
> -<br>
> - nir_builder builder;<br>
> - nir_builder_init(&builder, function->impl);<br>
> -<br>
> - nir_foreach_block(block, function->impl) {<br>
> - apply_dynamic_offsets_block(<wbr>block, &builder, pipeline->layout,<br>
> - add_bounds_checks, shader->num_uniforms);<br>
> - }<br>
> -<br>
> - nir_metadata_preserve(<wbr>function->impl, nir_metadata_block_index |<br>
> - nir_metadata_dominance);<br>
> - }<br>
> -<br>
> - struct anv_push_constants *null_data = NULL;<br>
> - for (unsigned i = 0; i < MAX_DYNAMIC_BUFFERS; i++) {<br>
> - prog_data->param[i * 2 + shader->num_uniforms / 4] =<br>
> - (const union gl_constant_value *)&null_data->dynamic[i].<wbr>offset;<br>
> - prog_data->param[i * 2 + 1 + shader->num_uniforms / 4] =<br>
> - (const union gl_constant_value *)&null_data->dynamic[i].<wbr>range;<br>
> - }<br>
> -<br>
> - shader->num_uniforms += MAX_DYNAMIC_BUFFERS * 8;<br>
> -}<br>
> diff --git a/src/intel/vulkan/anv_<wbr>pipeline.c b/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> index 64e409b..6287878 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> +++ b/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> @@ -356,9 +356,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,<br>
> prog_data->nr_params += MAX_PUSH_CONSTANTS_SIZE / sizeof(float);<br>
> }<br>
><br>
> - if (pipeline->layout && pipeline->layout->stage[stage]<wbr>.has_dynamic_offsets)<br>
> - prog_data->nr_params += MAX_DYNAMIC_BUFFERS * 2;<br>
> -<br>
> if (nir->info->num_images > 0) {<br>
> prog_data->nr_params += nir->info->num_images * BRW_IMAGE_PARAM_SIZE;<br>
> pipeline->needs_data_cache = true;<br>
> @@ -390,9 +387,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,<br>
> }<br>
> }<br>
><br>
> - /* Set up dynamic offsets */<br>
> - anv_nir_apply_dynamic_offsets(<wbr>pipeline, nir, prog_data);<br>
> -<br>
> /* Apply the actual pipeline layout to UBOs, SSBOs, and textures */<br>
> if (pipeline->layout)<br>
> anv_nir_apply_pipeline_layout(<wbr>pipeline, nir, prog_data, map);<br>
> diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
> index cf9874e..b8fba66 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> @@ -909,6 +909,12 @@ struct anv_descriptor {<br>
> enum isl_aux_usage aux_usage;<br>
> };<br>
><br>
> + struct {<br>
> + struct anv_buffer *buffer;<br>
> + uint64_t offset;<br>
> + uint64_t range;<br>
> + };<br>
> +<br>
> struct anv_buffer_view *buffer_view;<br>
> };<br>
> };<br>
> @@ -1180,12 +1186,6 @@ struct anv_push_constants {<br>
> uint32_t base_vertex;<br>
> uint32_t base_instance;<br>
><br>
> - /* Offsets and ranges for dynamically bound buffers */<br>
> - struct {<br>
> - uint32_t offset;<br>
> - uint32_t range;<br>
> - } dynamic[MAX_DYNAMIC_BUFFERS];<br>
> -<br>
> /* Image data for image_load_store on pre-SKL */<br>
> struct brw_image_param images[MAX_IMAGES];<br>
> };<br>
> @@ -1279,6 +1279,7 @@ struct anv_cmd_state {<br>
> uint32_t restart_index;<br>
> struct anv_vertex_binding vertex_bindings[MAX_VBS];<br>
> struct anv_descriptor_set * descriptors[MAX_SETS];<br>
> + uint32_t dynamic_offsets[MAX_DYNAMIC_<wbr>BUFFERS];<br>
> VkShaderStageFlags push_constant_stages;<br>
> struct anv_push_constants * push_constants[MESA_SHADER_<wbr>STAGES];<br>
> struct anv_state binding_tables[MESA_SHADER_<wbr>STAGES];<br>
> diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> index ae153d2..10b8790 100644<br>
> --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> @@ -1215,8 +1215,6 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,<br>
><br>
> case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER:<br>
> case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER:<br>
> - case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC:<br>
> - case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC:<br>
> case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>TEXEL_BUFFER:<br>
> surface_state = desc->buffer_view->surface_<wbr>state;<br>
> assert(surface_state.alloc_<wbr>size);<br>
> @@ -1225,6 +1223,34 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,<br>
> desc->buffer_view->offset);<br>
> break;<br>
><br>
> + case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC:<br>
> + case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC: {<br>
> + uint32_t dynamic_offset_idx =<br>
> + pipeline->layout->set[binding-<wbr>>set].dynamic_offset_start +<br>
> + set->layout->binding[binding-><wbr>binding].dynamic_offset_index +<br>
> + binding->index;<br>
> +<br>
> + /* Compute the offset within the buffer */<br>
> + uint64_t offset = desc->offset +<br>
> + cmd_buffer->state.dynamic_<wbr>offsets[dynamic_offset_idx];<br>
> + /* Clamp to the buffer size */<br>
> + offset = MIN2(offset, desc->buffer->size);<br>
> + /* Clamp the range to the buffer size */<br>
> + uint32_t range = MIN2(desc->range, desc->buffer->size - offset);<br>
> +<br>
> + surface_state =<br>
> + anv_state_stream_alloc(&cmd_<wbr>buffer->surface_state_stream, 64, 64);<br>
> + enum isl_format format =<br>
> + anv_isl_format_for_descriptor_<wbr>type(desc->type);<br>
> +<br>
> + anv_fill_buffer_surface_state(<wbr>cmd_buffer->device, surface_state,<br>
> + format, offset, range, 1);<br>
> + add_surface_state_reloc(cmd_<wbr>buffer, surface_state,<br>
> + desc->buffer->bo,<br>
> + desc->buffer->offset + offset);<br>
> + break;<br>
> + }<br>
> +<br>
> case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>TEXEL_BUFFER:<br>
> surface_state = (binding->write_only)<br>
> ? desc->buffer_view->writeonly_<wbr>storage_surface_state<br>
> --<br>
> 2.5.0.400.gff86faf<br>
</div></div></blockquote></div><br></div></div>