<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">After talking with Kristian, this seems like a reasonable direction to go so<br></div><div class="gmail_quote"><br>s/RFC/PATCH/<br><br></div><div class="gmail_quote">Reviews welcome.<br></div><div class="gmail_quote"><br>On Sat, Mar 4, 2017 at 12:19 PM, Kristian H. Kristensen <span dir="ltr"><<a href="mailto:krh@bitplanet.net" target="_blank">krh@bitplanet.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
<br>
> We have a performance problem with dynamic buffer descriptors.  Because<br>
> we are currently implementing them by pushing an offset into the shader<br>
> and adding that offset onto the already existing offset for the UBO/SSBO<br>
> operation, all UBO/SSBO operations on dynamic descriptors are indirect.<br>
> The back-end compiler implements indirect pull constant loads using what<br>
> basically amounts to a texelFetch instruction.  For pull constant loads<br>
> with constant offsets, however, we use an oword block read message which<br>
> goes through the constant cache and reads a whole cache line at a time.<br>
> Because of these two things, direct pull constant loads are much faster<br>
> than indirect pull constant loads.  Because all loads from dynamically<br>
> bound buffers are indirect, the user takes a substantial performance<br>
> penalty when using this "performance" feature.<br>
><br>
> There are two potential solutions I have seen for this problem.  The<br>
> alternate solution is to continue pushing offsets into the shader but<br>
> wire things up in the back-end compiler so that we use the oword block<br>
> read messages anyway.  The only reason we can do this because we know a<br>
> priori that the dynamic offsets are uniform and 16-byte aligned.<br>
> Unfortunately, thanks to the 16-byte alignment requirement of the oword<br>
> messages, we can't do some general "if the indirect offset is uniform,<br>
> use an oword message" sort of thing.<br>
><br>
> This solution, however, is recommended for a few of reasons:<br>
><br>
>  1. Surface states are relatively cheap.  We've been using on-the-fly<br>
>     surface state setup for some time in GL and it works well.  Also,<br>
>     dynamic offsets with on-the-fly surface state should still be<br>
>     cheaper than allocating new descriptor sets every time you want to<br>
>     change a buffer offset which is really the only requirement of the<br>
>     dynamic offsets feature.<br>
><br>
>  2. This requires substantially less compiler plumbing.  Not only can we<br>
>     delete the entire apply_dynamic_offsets pass but we can also avoid<br>
>     having to add architecture for passing dynamic offsets to the back-<br>
>     end compiler in such a way that it can continue using oword messages.<br>
><br>
>  3. We get robust buffer access range-checking for free.  Because the<br>
>     offset and range are baked into the surface state, we no longer need<br>
>     to pass ranges around and do bounds-checking in the shader.<br>
><br>
>  4. Once we finally get UBO pushing implemented, it will be much easier<br>
>     to handle pushing chunks of dynamic descriptors if the compiler<br>
>     remains blissfully unaware of dynamic descriptors.<br>
><br>
> This commit improves performance of The Talos Principle on ULTRA<br>
> settings by around 50% and brings it nicely into line with OpenGL<br>
> performance.<br>
<br>
</div></div>Does the uniform analysis pass and the oword read result in a similar<br>
improvement?</blockquote><div><br></div><div>Yes it does.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I think both approaches are fine, but you might want to<br>
keep the uniform pass around - there's a lot of URB reads and writes in<br>
GS/HS/DS that are dynamically uniform but end up using per-slot offsets<br>
unconditionally.<br>
<span class="HOEnZb"><font color="#888888"><br>
Kristian<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> Cc: Kristian Høgsberg <<a href="mailto:krh@bitplanet.net">krh@bitplanet.net</a>><br>
> ---<br>
>  src/intel/vulkan/Makefile.<wbr>sources                |   1 -<br>
>  src/intel/vulkan/anv_cmd_<wbr>buffer.c                |  47 +++----<br>
>  src/intel/vulkan/anv_<wbr>descriptor_set.c            |  62 ++++----<br>
>  src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c | 172 -----------------------<br>
>  src/intel/vulkan/anv_pipeline.<wbr>c                  |   6 -<br>
>  src/intel/vulkan/anv_private.h                   |  13 +-<br>
>  src/intel/vulkan/genX_cmd_<wbr>buffer.c               |  30 +++-<br>
>  7 files changed, 86 insertions(+), 245 deletions(-)<br>
>  delete mode 100644 src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
><br>
> diff --git a/src/intel/vulkan/Makefile.<wbr>sources b/src/intel/vulkan/Makefile.<wbr>sources<br>
> index fd149b2..24e2225 100644<br>
> --- a/src/intel/vulkan/Makefile.<wbr>sources<br>
> +++ b/src/intel/vulkan/Makefile.<wbr>sources<br>
> @@ -32,7 +32,6 @@ VULKAN_FILES := \<br>
>       anv_image.c \<br>
>       anv_intel.c \<br>
>       anv_nir.h \<br>
> -     anv_nir_apply_dynamic_offsets.<wbr>c \<br>
>       anv_nir_apply_pipeline_layout.<wbr>c \<br>
>       anv_nir_lower_input_<wbr>attachments.c \<br>
>       anv_nir_lower_push_constants.c \<br>
> diff --git a/src/intel/vulkan/anv_cmd_<wbr>buffer.c b/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> index cab1dd7..a6ad48a 100644<br>
> --- a/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/anv_cmd_<wbr>buffer.c<br>
> @@ -507,42 +507,31 @@ void anv_CmdBindDescriptorSets(<br>
><br>
>     assert(firstSet + descriptorSetCount < MAX_SETS);<br>
><br>
> +   uint32_t dynamic_slot = 0;<br>
>     for (uint32_t i = 0; i < descriptorSetCount; i++) {<br>
>        ANV_FROM_HANDLE(anv_<wbr>descriptor_set, set, pDescriptorSets[i]);<br>
>        set_layout = layout->set[firstSet + i].layout;<br>
><br>
> -      if (cmd_buffer->state.<wbr>descriptors[firstSet + i] != set) {<br>
> -         cmd_buffer->state.descriptors[<wbr>firstSet + i] = set;<br>
> -         cmd_buffer->state.descriptors_<wbr>dirty |= set_layout->shader_stages;<br>
> -      }<br>
> +      cmd_buffer->state.descriptors[<wbr>firstSet + i] = set;<br>
><br>
>        if (set_layout->dynamic_offset_<wbr>count > 0) {<br>
> -         anv_foreach_stage(s, set_layout->shader_stages) {<br>
> -            anv_cmd_buffer_ensure_push_<wbr>constant_field(cmd_buffer, s, dynamic);<br>
> -<br>
> -            struct anv_push_constants *push =<br>
> -               cmd_buffer->state.push_<wbr>constants[s];<br>
> -<br>
> -            unsigned d = layout->set[firstSet + i].dynamic_offset_start;<br>
> -            const uint32_t *offsets = pDynamicOffsets;<br>
> -            struct anv_descriptor *desc = set->descriptors;<br>
> -<br>
> -            for (unsigned b = 0; b < set_layout->binding_count; b++) {<br>
> -               if (set_layout->binding[b].<wbr>dynamic_offset_index < 0)<br>
> -                  continue;<br>
> -<br>
> -               unsigned array_size = set_layout->binding[b].array_<wbr>size;<br>
> -               for (unsigned j = 0; j < array_size; j++) {<br>
> -                  push->dynamic[d].offset = *(offsets++);<br>
> -                  push->dynamic[d].range = (desc->buffer_view) ?<br>
> -                                            desc->buffer_view->range : 0;<br>
> -                  desc++;<br>
> -                  d++;<br>
> -               }<br>
> -            }<br>
> -         }<br>
> -         cmd_buffer->state.push_<wbr>constants_dirty |= set_layout->shader_stages;<br>
> +         uint32_t dynamic_offset_start =<br>
> +            layout->set[firstSet + i].dynamic_offset_start;<br>
> +<br>
> +         /* Assert that everything is in range */<br>
> +         assert(dynamic_offset_start + set_layout->dynamic_offset_<wbr>count <=<br>
> +                ARRAY_SIZE(cmd_buffer->state.<wbr>dynamic_offsets));<br>
> +         assert(dynamic_slot + set_layout->dynamic_offset_<wbr>count <=<br>
> +                dynamicOffsetCount);<br>
> +<br>
> +         typed_memcpy(&cmd_buffer-><wbr>state.dynamic_offsets[dynamic_<wbr>offset_start],<br>
> +                      &pDynamicOffsets[dynamic_slot]<wbr>,<br>
> +                      set_layout->dynamic_offset_<wbr>count);<br>
> +<br>
> +         dynamic_slot += set_layout->dynamic_offset_<wbr>count;<br>
>        }<br>
> +<br>
> +      cmd_buffer->state.descriptors_<wbr>dirty |= set_layout->shader_stages;<br>
>     }<br>
>  }<br>
><br>
> diff --git a/src/intel/vulkan/anv_<wbr>descriptor_set.c b/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> index 2a37d7d..175efdb 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> +++ b/src/intel/vulkan/anv_<wbr>descriptor_set.c<br>
> @@ -662,35 +662,39 @@ anv_descriptor_set_write_<wbr>buffer(struct anv_descriptor_set *set,<br>
><br>
>     assert(type == bind_layout->type);<br>
><br>
> -   struct anv_buffer_view *bview =<br>
> -      &set->buffer_views[bind_<wbr>layout->buffer_index + element];<br>
> -<br>
> -   bview->format = anv_isl_format_for_descriptor_<wbr>type(type);<br>
> -   bview->bo = buffer->bo;<br>
> -   bview->offset = buffer->offset + offset;<br>
> -<br>
> -   /* For buffers with dynamic offsets, we use the full possible range in the<br>
> -    * surface state and do the actual range-checking in the shader.<br>
> -    */<br>
> -   if (bind_layout->dynamic_offset_<wbr>index >= 0)<br>
> -      range = VK_WHOLE_SIZE;<br>
> -   bview->range = anv_buffer_get_range(buffer, offset, range);<br>
> -<br>
> -   /* If we're writing descriptors through a push command, we need to allocate<br>
> -    * the surface state from the command buffer. Otherwise it will be<br>
> -    * allocated by the descriptor pool when calling<br>
> -    * vkAllocateDescriptorSets. */<br>
> -   if (alloc_stream)<br>
> -      bview->surface_state = anv_state_stream_alloc(alloc_<wbr>stream, 64, 64);<br>
> -<br>
> -   anv_fill_buffer_surface_state(<wbr>device, bview->surface_state,<br>
> -                                 bview->format,<br>
> -                                 bview->offset, bview->range, 1);<br>
> -<br>
> -   *desc = (struct anv_descriptor) {<br>
> -      .type = type,<br>
> -      .buffer_view = bview,<br>
> -   };<br>
> +   if (type == VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC ||<br>
> +       type == VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC) {<br>
> +      *desc = (struct anv_descriptor) {<br>
> +         .type = type,<br>
> +         .buffer = buffer,<br>
> +         .offset = offset,<br>
> +         .range = range,<br>
> +      };<br>
> +   } else {<br>
> +      struct anv_buffer_view *bview =<br>
> +         &set->buffer_views[bind_<wbr>layout->buffer_index + element];<br>
> +<br>
> +      bview->format = anv_isl_format_for_descriptor_<wbr>type(type);<br>
> +      bview->bo = buffer->bo;<br>
> +      bview->offset = buffer->offset + offset;<br>
> +      bview->range = anv_buffer_get_range(buffer, offset, range);<br>
> +<br>
> +      /* If we're writing descriptors through a push command, we need to<br>
> +       * allocate the surface state from the command buffer. Otherwise it will<br>
> +       * be allocated by the descriptor pool when calling<br>
> +       * vkAllocateDescriptorSets. */<br>
> +      if (alloc_stream)<br>
> +         bview->surface_state = anv_state_stream_alloc(alloc_<wbr>stream, 64, 64);<br>
> +<br>
> +      anv_fill_buffer_surface_state(<wbr>device, bview->surface_state,<br>
> +                                    bview->format,<br>
> +                                    bview->offset, bview->range, 1);<br>
> +<br>
> +      *desc = (struct anv_descriptor) {<br>
> +         .type = type,<br>
> +         .buffer_view = bview,<br>
> +      };<br>
> +   }<br>
>  }<br>
><br>
>  void anv_UpdateDescriptorSets(<br>
> diff --git a/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c b/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
> deleted file mode 100644<br>
> index 80ef8ee..0000000<br>
> --- a/src/intel/vulkan/anv_nir_<wbr>apply_dynamic_offsets.c<br>
> +++ /dev/null<br>
> @@ -1,172 +0,0 @@<br>
> -/*<br>
> - * Copyright © 2015 Intel Corporation<br>
> - *<br>
> - * Permission is hereby granted, free of charge, to any person obtaining a<br>
> - * copy of this software and associated documentation files (the "Software"),<br>
> - * to deal in the Software without restriction, including without limitation<br>
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br>
> - * and/or sell copies of the Software, and to permit persons to whom the<br>
> - * Software is furnished to do so, subject to the following conditions:<br>
> - *<br>
> - * The above copyright notice and this permission notice (including the next<br>
> - * paragraph) shall be included in all copies or substantial portions of the<br>
> - * Software.<br>
> - *<br>
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br>
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br>
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL<br>
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br>
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br>
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br>
> - * IN THE SOFTWARE.<br>
> - */<br>
> -<br>
> -#include "anv_nir.h"<br>
> -#include "nir/nir_builder.h"<br>
> -<br>
> -static void<br>
> -apply_dynamic_offsets_block(<wbr>nir_block *block, nir_builder *b,<br>
> -                            const struct anv_pipeline_layout *layout,<br>
> -                            bool add_bounds_checks,<br>
> -                            uint32_t indices_start)<br>
> -{<br>
> -   struct anv_descriptor_set_layout *set_layout;<br>
> -<br>
> -   nir_foreach_instr_safe(instr, block) {<br>
> -      if (instr->type != nir_instr_type_intrinsic)<br>
> -         continue;<br>
> -<br>
> -      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);<br>
> -<br>
> -      unsigned block_idx_src;<br>
> -      switch (intrin->intrinsic) {<br>
> -      case nir_intrinsic_load_ubo:<br>
> -      case nir_intrinsic_load_ssbo:<br>
> -         block_idx_src = 0;<br>
> -         break;<br>
> -      case nir_intrinsic_store_ssbo:<br>
> -         block_idx_src = 1;<br>
> -         break;<br>
> -      default:<br>
> -         continue; /* the loop */<br>
> -      }<br>
> -<br>
> -      nir_instr *res_instr = intrin->src[block_idx_src].<wbr>ssa->parent_instr;<br>
> -      assert(res_instr->type == nir_instr_type_intrinsic);<br>
> -      nir_intrinsic_instr *res_intrin = nir_instr_as_intrinsic(res_<wbr>instr);<br>
> -      assert(res_intrin->intrinsic == nir_intrinsic_vulkan_resource_<wbr>index);<br>
> -<br>
> -      unsigned set = res_intrin->const_index[0];<br>
> -      unsigned binding = res_intrin->const_index[1];<br>
> -<br>
> -      set_layout = layout->set[set].layout;<br>
> -      if (set_layout->binding[binding].<wbr>dynamic_offset_index < 0)<br>
> -         continue;<br>
> -<br>
> -      b->cursor = nir_before_instr(&intrin-><wbr>instr);<br>
> -<br>
> -      /* First, we need to generate the uniform load for the buffer offset */<br>
> -      uint32_t index = layout->set[set].dynamic_<wbr>offset_start +<br>
> -                       set_layout->binding[binding].<wbr>dynamic_offset_index;<br>
> -      uint32_t array_size = set_layout->binding[binding].<wbr>array_size;<br>
> -<br>
> -      nir_intrinsic_instr *offset_load =<br>
> -         nir_intrinsic_instr_create(b-><wbr>shader, nir_intrinsic_load_uniform);<br>
> -      offset_load->num_components = 2;<br>
> -      nir_intrinsic_set_base(offset_<wbr>load, indices_start + index * 8);<br>
> -      nir_intrinsic_set_range(<wbr>offset_load, array_size * 8);<br>
> -      offset_load->src[0] = nir_src_for_ssa(nir_imul(b, res_intrin->src[0].ssa,<br>
> -                                                     nir_imm_int(b, 8)));<br>
> -<br>
> -      nir_ssa_dest_init(&offset_<wbr>load->instr, &offset_load->dest, 2, 32, NULL);<br>
> -      nir_builder_instr_insert(b, &offset_load->instr);<br>
> -<br>
> -      nir_src *offset_src = nir_get_io_offset_src(intrin);<br>
> -      nir_ssa_def *old_offset = nir_ssa_for_src(b, *offset_src, 1);<br>
> -      nir_ssa_def *new_offset = nir_iadd(b, old_offset, &offset_load->dest.ssa);<br>
> -      nir_instr_rewrite_src(&intrin-<wbr>>instr, offset_src,<br>
> -                            nir_src_for_ssa(new_offset));<br>
> -<br>
> -      if (!add_bounds_checks)<br>
> -         continue;<br>
> -<br>
> -      /* In order to avoid out-of-bounds access, we predicate */<br>
> -      nir_ssa_def *pred = nir_uge(b, nir_channel(b, &offset_load->dest.ssa, 1),<br>
> -                                  old_offset);<br>
> -      nir_if *if_stmt = nir_if_create(b->shader);<br>
> -      if_stmt->condition = nir_src_for_ssa(pred);<br>
> -      nir_cf_node_insert(b->cursor, &if_stmt->cf_node);<br>
> -<br>
> -      nir_instr_remove(&intrin-><wbr>instr);<br>
> -      nir_instr_insert_after_cf_<wbr>list(&if_stmt->then_list, &intrin->instr);<br>
> -<br>
> -      if (intrin->intrinsic != nir_intrinsic_store_ssbo) {<br>
> -         /* It's a load, we need a phi node */<br>
> -         nir_phi_instr *phi = nir_phi_instr_create(b-><wbr>shader);<br>
> -         nir_ssa_dest_init(&phi->instr, &phi->dest,<br>
> -                           intrin->num_components,<br>
> -                           intrin->dest.ssa.bit_size, NULL);<br>
> -<br>
> -         nir_phi_src *src1 = ralloc(phi, nir_phi_src);<br>
> -         struct exec_node *tnode = exec_list_get_tail(&if_stmt-><wbr>then_list);<br>
> -         src1->pred = exec_node_data(nir_block, tnode, cf_node.node);<br>
> -         src1->src = nir_src_for_ssa(&intrin->dest.<wbr>ssa);<br>
> -         exec_list_push_tail(&phi-><wbr>srcs, &src1->node);<br>
> -<br>
> -         b->cursor = nir_after_cf_list(&if_stmt-><wbr>else_list);<br>
> -         nir_const_value zero_val = { .u32 = { 0, 0, 0, 0 } };<br>
> -         nir_ssa_def *zero = nir_build_imm(b, intrin->num_components,<br>
> -                                           intrin->dest.ssa.bit_size, zero_val);<br>
> -<br>
> -         nir_phi_src *src2 = ralloc(phi, nir_phi_src);<br>
> -         struct exec_node *enode = exec_list_get_tail(&if_stmt-><wbr>else_list);<br>
> -         src2->pred = exec_node_data(nir_block, enode, cf_node.node);<br>
> -         src2->src = nir_src_for_ssa(zero);<br>
> -         exec_list_push_tail(&phi-><wbr>srcs, &src2->node);<br>
> -<br>
> -         assert(intrin->dest.is_ssa);<br>
> -         nir_ssa_def_rewrite_uses(&<wbr>intrin->dest.ssa,<br>
> -                                  nir_src_for_ssa(&phi->dest.<wbr>ssa));<br>
> -<br>
> -         nir_instr_insert_after_cf(&if_<wbr>stmt->cf_node, &phi->instr);<br>
> -      }<br>
> -   }<br>
> -}<br>
> -<br>
> -void<br>
> -anv_nir_apply_dynamic_<wbr>offsets(struct anv_pipeline *pipeline,<br>
> -                              nir_shader *shader,<br>
> -                              struct brw_stage_prog_data *prog_data)<br>
> -{<br>
> -   const struct anv_pipeline_layout *layout = pipeline->layout;<br>
> -   if (!layout || !layout->stage[shader->stage].<wbr>has_dynamic_offsets)<br>
> -      return;<br>
> -<br>
> -   const bool add_bounds_checks = pipeline->device->robust_<wbr>buffer_access;<br>
> -<br>
> -   nir_foreach_function(function, shader) {<br>
> -      if (!function->impl)<br>
> -         continue;<br>
> -<br>
> -      nir_builder builder;<br>
> -      nir_builder_init(&builder, function->impl);<br>
> -<br>
> -      nir_foreach_block(block, function->impl) {<br>
> -         apply_dynamic_offsets_block(<wbr>block, &builder, pipeline->layout,<br>
> -                                     add_bounds_checks, shader->num_uniforms);<br>
> -      }<br>
> -<br>
> -      nir_metadata_preserve(<wbr>function->impl, nir_metadata_block_index |<br>
> -                                            nir_metadata_dominance);<br>
> -   }<br>
> -<br>
> -   struct anv_push_constants *null_data = NULL;<br>
> -   for (unsigned i = 0; i < MAX_DYNAMIC_BUFFERS; i++) {<br>
> -      prog_data->param[i * 2 + shader->num_uniforms / 4] =<br>
> -         (const union gl_constant_value *)&null_data->dynamic[i].<wbr>offset;<br>
> -      prog_data->param[i * 2 + 1 + shader->num_uniforms / 4] =<br>
> -         (const union gl_constant_value *)&null_data->dynamic[i].<wbr>range;<br>
> -   }<br>
> -<br>
> -   shader->num_uniforms += MAX_DYNAMIC_BUFFERS * 8;<br>
> -}<br>
> diff --git a/src/intel/vulkan/anv_<wbr>pipeline.c b/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> index 64e409b..6287878 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> +++ b/src/intel/vulkan/anv_<wbr>pipeline.c<br>
> @@ -356,9 +356,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,<br>
>        prog_data->nr_params += MAX_PUSH_CONSTANTS_SIZE / sizeof(float);<br>
>     }<br>
><br>
> -   if (pipeline->layout && pipeline->layout->stage[stage]<wbr>.has_dynamic_offsets)<br>
> -      prog_data->nr_params += MAX_DYNAMIC_BUFFERS * 2;<br>
> -<br>
>     if (nir->info->num_images > 0) {<br>
>        prog_data->nr_params += nir->info->num_images * BRW_IMAGE_PARAM_SIZE;<br>
>        pipeline->needs_data_cache = true;<br>
> @@ -390,9 +387,6 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,<br>
>        }<br>
>     }<br>
><br>
> -   /* Set up dynamic offsets */<br>
> -   anv_nir_apply_dynamic_offsets(<wbr>pipeline, nir, prog_data);<br>
> -<br>
>     /* Apply the actual pipeline layout to UBOs, SSBOs, and textures */<br>
>     if (pipeline->layout)<br>
>        anv_nir_apply_pipeline_layout(<wbr>pipeline, nir, prog_data, map);<br>
> diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
> index cf9874e..b8fba66 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> @@ -909,6 +909,12 @@ struct anv_descriptor {<br>
>           enum isl_aux_usage aux_usage;<br>
>        };<br>
><br>
> +      struct {<br>
> +         struct anv_buffer *buffer;<br>
> +         uint64_t offset;<br>
> +         uint64_t range;<br>
> +      };<br>
> +<br>
>        struct anv_buffer_view *buffer_view;<br>
>     };<br>
>  };<br>
> @@ -1180,12 +1186,6 @@ struct anv_push_constants {<br>
>     uint32_t base_vertex;<br>
>     uint32_t base_instance;<br>
><br>
> -   /* Offsets and ranges for dynamically bound buffers */<br>
> -   struct {<br>
> -      uint32_t offset;<br>
> -      uint32_t range;<br>
> -   } dynamic[MAX_DYNAMIC_BUFFERS];<br>
> -<br>
>     /* Image data for image_load_store on pre-SKL */<br>
>     struct brw_image_param images[MAX_IMAGES];<br>
>  };<br>
> @@ -1279,6 +1279,7 @@ struct anv_cmd_state {<br>
>     uint32_t                                     restart_index;<br>
>     struct anv_vertex_binding                    vertex_bindings[MAX_VBS];<br>
>     struct anv_descriptor_set *                  descriptors[MAX_SETS];<br>
> +   uint32_t                                     dynamic_offsets[MAX_DYNAMIC_<wbr>BUFFERS];<br>
>     VkShaderStageFlags                           push_constant_stages;<br>
>     struct anv_push_constants *                  push_constants[MESA_SHADER_<wbr>STAGES];<br>
>     struct anv_state                             binding_tables[MESA_SHADER_<wbr>STAGES];<br>
> diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> index ae153d2..10b8790 100644<br>
> --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> @@ -1215,8 +1215,6 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,<br>
><br>
>        case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER:<br>
>        case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER:<br>
> -      case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC:<br>
> -      case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC:<br>
>        case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>TEXEL_BUFFER:<br>
>           surface_state = desc->buffer_view->surface_<wbr>state;<br>
>           assert(surface_state.alloc_<wbr>size);<br>
> @@ -1225,6 +1223,34 @@ emit_binding_table(struct anv_cmd_buffer *cmd_buffer,<br>
>                                   desc->buffer_view->offset);<br>
>           break;<br>
><br>
> +      case VK_DESCRIPTOR_TYPE_UNIFORM_<wbr>BUFFER_DYNAMIC:<br>
> +      case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>BUFFER_DYNAMIC: {<br>
> +         uint32_t dynamic_offset_idx =<br>
> +            pipeline->layout->set[binding-<wbr>>set].dynamic_offset_start +<br>
> +            set->layout->binding[binding-><wbr>binding].dynamic_offset_index +<br>
> +            binding->index;<br>
> +<br>
> +         /* Compute the offset within the buffer */<br>
> +         uint64_t offset = desc->offset +<br>
> +            cmd_buffer->state.dynamic_<wbr>offsets[dynamic_offset_idx];<br>
> +         /* Clamp to the buffer size */<br>
> +         offset = MIN2(offset, desc->buffer->size);<br>
> +         /* Clamp the range to the buffer size */<br>
> +         uint32_t range = MIN2(desc->range, desc->buffer->size - offset);<br>
> +<br>
> +         surface_state =<br>
> +            anv_state_stream_alloc(&cmd_<wbr>buffer->surface_state_stream, 64, 64);<br>
> +         enum isl_format format =<br>
> +            anv_isl_format_for_descriptor_<wbr>type(desc->type);<br>
> +<br>
> +         anv_fill_buffer_surface_state(<wbr>cmd_buffer->device, surface_state,<br>
> +                                       format, offset, range, 1);<br>
> +         add_surface_state_reloc(cmd_<wbr>buffer, surface_state,<br>
> +                                 desc->buffer->bo,<br>
> +                                 desc->buffer->offset + offset);<br>
> +         break;<br>
> +      }<br>
> +<br>
>        case VK_DESCRIPTOR_TYPE_STORAGE_<wbr>TEXEL_BUFFER:<br>
>           surface_state = (binding->write_only)<br>
>              ? desc->buffer_view->writeonly_<wbr>storage_surface_state<br>
> --<br>
> 2.5.0.400.gff86faf<br>
</div></div></blockquote></div><br></div></div>