[Mesa-dev] [PATCH 1/2] gallivm: don't use control flow when doing indirect constant buffer lookups

Ilia Mirkin imirkin at alum.mit.edu
Mon May 25 11:04:01 PDT 2015


This is the code for setting the vs constbuf (in nv30_draw.c):

         void *map = nv04_resource(nv30->vertprog.constbuf)->data;
         debug_printf("map: %p, nr: %d\n", map, nv30->vertprog.constbuf_nr);
         draw_set_mapped_constant_buffer(draw, PIPE_SHADER_VERTEX, 0,
                                         map, nv30->vertprog.constbuf_nr);

This prints:

map: 0x2559d00, nr: 12

Does that seem wrong in any way? Hm... the size should probably be *4*4... oops.

Yeah that works much better :) Thanks for the hint!

On Mon, May 25, 2015 at 1:42 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> The only thing I can think of is an illegal setup of the constant
> buffer, because the index was not verified before for direct accesses,
> which made this work before even for "illegal" sizes.
> This is still not verified but buffers which get rounded down to size
> zero are now definitely no longer working.
> See https://bugs.freedesktop.org/show_bug.cgi?id=90081
>
> I'm still not actually sure (tried to get some feedback without much
> success on the list) if a bind buffer range with offset properly aligned
> but size not should actually work correctly, hence still don't know if
> there's actually something wrong at all (though technically it would
> probably be safer if the index is also verified for direct accesses).
> You can however trivially test that theory by removing the code which
> replaces zero-sized buffers with dummy ones (which is just there so we
> don't crash when trying to access them).
>
> Roland
>
>
>
> Am 24.05.2015 um 06:42 schrieb Ilia Mirkin:
>> Hi Roland,
>>
>> I've just bisected a nv30 swtnl regression to this commit. When
>> running the nv30 driver (on a NV44, if it matters) and forcing swtnl
>> (NV30_SWTNL=1), glxgears from 60% broken to 100% broken. Now, I'm not
>> sure what the initial breakage is, but at least it shows a
>> gears-looking thing some of the time. After this change, I just see a
>> few random lines every 30 frames or so. Setting DRAW_USE_LLVM=0 makes
>> the whole thing fail horribly before and after this change (nothing
>> drawn at all).
>>
>> Do you think that it's likely coincidence that the LLVM path worked at
>> all before this and that I should figure out what's not working with
>> DRAW_USE_LLVM=0? Or can you think of some reason why this change may
>> have broken the llvm path in a swtnl use-case with a driver like nv30
>> (no gs, no integers)?
>>
>> Thanks,
>>
>>   -ilia
>>
>>
>> On Sat, Apr 4, 2015 at 10:50 AM,  <sroland at vmware.com> wrote:
>>> From: Roland Scheidegger <sroland at vmware.com>
>>>
>>> llvm goes crazy when doing that, using way more memory and time, though there's
>>> probably more to it - this points to a very much similar issue as fixed in
>>> 8a9f5ecdb116d0449d63f7b94efbfa8b205d826f. In any case I've seen a quite
>>> plain looking vertex shader with just ~50 simple tgsi instructions (but with a
>>> dozen or so such indirect constant buffer lookups) go from a terribly high
>>> ~440ms compile time (consuming 25MB of memory in the process) down to a still
>>> awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
>>> improvements possible (but I have no clue why it's so slow...).
>>> The resulting shader is most likely also faster (certainly seemed so though
>>> I don't have any hard numbers as it may have been influenced by compile times)
>>> since generally fetching constants outside the buffer range is most likely an
>>> app error (that is we expect all indices to be valid).
>>> It is possible this fixes some mysterious vertex shader slowdowns we've seen
>>> ever since we are conforming to newer apis at least partially (the main draw
>>> loop also has similar looking conditionals which we probably could do without -
>>> if not for the fetch at least for the additional elts condition.)
>>> ---
>>>  src/gallium/auxiliary/draw/draw_llvm.h             |  2 +
>>>  .../draw/draw_pt_fetch_shade_pipeline_llvm.c       | 27 +++---
>>>  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c    | 95 +++++++++-------------
>>>  src/gallium/drivers/llvmpipe/lp_scene.h            |  2 +
>>>  src/gallium/drivers/llvmpipe/lp_setup.c            |  6 +-
>>>  5 files changed, 63 insertions(+), 69 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h b/src/gallium/auxiliary/draw/draw_llvm.h
>>> index 9565fc6..a1983e1 100644
>>> --- a/src/gallium/auxiliary/draw/draw_llvm.h
>>> +++ b/src/gallium/auxiliary/draw/draw_llvm.h
>>> @@ -472,6 +472,8 @@ struct draw_llvm {
>>>
>>>     struct draw_gs_llvm_variant_list_item gs_variants_list;
>>>     int nr_gs_variants;
>>> +
>>> +   float fake_const_buf[4];
>>>  };
>>>
>>>
>>> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
>>> index 0dfafdc..03257d8 100644
>>> --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
>>> +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c
>>> @@ -273,28 +273,35 @@ llvm_middle_end_bind_parameters(struct draw_pt_middle_end *middle)
>>>  {
>>>     struct llvm_middle_end *fpme = llvm_middle_end(middle);
>>>     struct draw_context *draw = fpme->draw;
>>> +   struct draw_llvm *llvm = fpme->llvm;
>>>     unsigned i;
>>>
>>> -   for (i = 0; i < Elements(fpme->llvm->jit_context.vs_constants); ++i) {
>>> +   for (i = 0; i < Elements(llvm->jit_context.vs_constants); ++i) {
>>>        int num_consts =
>>>           draw->pt.user.vs_constants_size[i] / (sizeof(float) * 4);
>>> -      fpme->llvm->jit_context.vs_constants[i] = draw->pt.user.vs_constants[i];
>>> -      fpme->llvm->jit_context.num_vs_constants[i] = num_consts;
>>> +      llvm->jit_context.vs_constants[i] = draw->pt.user.vs_constants[i];
>>> +      llvm->jit_context.num_vs_constants[i] = num_consts;
>>> +      if (num_consts == 0) {
>>> +         llvm->jit_context.vs_constants[i] = llvm->fake_const_buf;
>>> +      }
>>>     }
>>> -   for (i = 0; i < Elements(fpme->llvm->gs_jit_context.constants); ++i) {
>>> +   for (i = 0; i < Elements(llvm->gs_jit_context.constants); ++i) {
>>>        int num_consts =
>>>           draw->pt.user.gs_constants_size[i] / (sizeof(float) * 4);
>>> -      fpme->llvm->gs_jit_context.constants[i] = draw->pt.user.gs_constants[i];
>>> -      fpme->llvm->gs_jit_context.num_constants[i] = num_consts;
>>> +      llvm->gs_jit_context.constants[i] = draw->pt.user.gs_constants[i];
>>> +      llvm->gs_jit_context.num_constants[i] = num_consts;
>>> +      if (num_consts == 0) {
>>> +         llvm->gs_jit_context.constants[i] = llvm->fake_const_buf;
>>> +      }
>>>     }
>>>
>>> -   fpme->llvm->jit_context.planes =
>>> +   llvm->jit_context.planes =
>>>        (float (*)[DRAW_TOTAL_CLIP_PLANES][4]) draw->pt.user.planes[0];
>>> -   fpme->llvm->gs_jit_context.planes =
>>> +   llvm->gs_jit_context.planes =
>>>        (float (*)[DRAW_TOTAL_CLIP_PLANES][4]) draw->pt.user.planes[0];
>>>
>>> -   fpme->llvm->jit_context.viewports = draw->viewports;
>>> -   fpme->llvm->gs_jit_context.viewports = draw->viewports;
>>> +   llvm->jit_context.viewports = draw->viewports;
>>> +   llvm->gs_jit_context.viewports = draw->viewports;
>>>  }
>>>
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>>> index 17b68ff..5aa2846 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
>>> @@ -944,20 +944,39 @@ gather_outputs(struct lp_build_tgsi_soa_context * bld)
>>>   * with a little work.
>>>   */
>>>  static LLVMValueRef
>>> -build_gather(struct lp_build_context *bld,
>>> +build_gather(struct lp_build_tgsi_context *bld_base,
>>>               LLVMValueRef base_ptr,
>>>               LLVMValueRef indexes,
>>> -             LLVMValueRef *overflow_mask)
>>> +             LLVMValueRef overflow_mask)
>>>  {
>>> -   LLVMBuilderRef builder = bld->gallivm->builder;
>>> +   struct gallivm_state *gallivm = bld_base->base.gallivm;
>>> +   LLVMBuilderRef builder = gallivm->builder;
>>> +   struct lp_build_context *uint_bld = &bld_base->uint_bld;
>>> +   struct lp_build_context *bld = &bld_base->base;
>>>     LLVMValueRef res = bld->undef;
>>>     unsigned i;
>>> -   LLVMValueRef temp_ptr = NULL;
>>> +
>>> +   /*
>>> +    * overflow_mask is a vector telling us which channels
>>> +    * in the vector overflowed. We use the overflow behavior for
>>> +    * constant buffers which is defined as:
>>> +    * Out of bounds access to constant buffer returns 0 in all
>>> +    * components. Out of bounds behavior is always with respect
>>> +    * to the size of the buffer bound at that slot.
>>> +    */
>>>
>>>     if (overflow_mask) {
>>> -      temp_ptr = lp_build_alloca(
>>> -         bld->gallivm,
>>> -         lp_build_vec_type(bld->gallivm, bld->type), "");
>>> +      /*
>>> +       * We avoid per-element control flow here (also due to llvm going crazy,
>>> +       * though I suspect it's better anyway since overflow is likely rare).
>>> +       * Note that since we still fetch from buffers even if num_elements was
>>> +       * zero (in this case we'll fetch from index zero) the jit func callers
>>> +       * MUST provide valid fake constant buffers of size 4x32 (the values do
>>> +       * not matter), otherwise we'd still need (not per element though)
>>> +       * control flow.
>>> +       */
>>> +      LLVMValueRef zeros = lp_build_const_int_vec(gallivm, uint_bld->type, 0);
>>> +      indexes = lp_build_select(uint_bld, overflow_mask, zeros, indexes);
>>>     }
>>>
>>>     /*
>>> @@ -968,53 +987,17 @@ build_gather(struct lp_build_context *bld,
>>>        LLVMValueRef index = LLVMBuildExtractElement(builder,
>>>                                                     indexes, ii, "");
>>>        LLVMValueRef scalar_ptr, scalar;
>>> -      LLVMValueRef overflow;
>>> -      struct lp_build_if_state if_ctx;
>>> -
>>> -      /*
>>> -       * overflow_mask is a boolean vector telling us which channels
>>> -       * in the vector overflowed. We use the overflow behavior for
>>> -       * constant buffers which is defined as:
>>> -       * Out of bounds access to constant buffer returns 0 in all
>>> -       * componenets. Out of bounds behavior is always with respect
>>> -       * to the size of the buffer bound at that slot.
>>> -       */
>>> -      if (overflow_mask) {
>>> -         overflow = LLVMBuildExtractElement(builder, *overflow_mask,
>>> -                                            ii, "");
>>> -         lp_build_if(&if_ctx, bld->gallivm, overflow);
>>> -         {
>>> -            LLVMValueRef val = LLVMBuildLoad(builder, temp_ptr, "");
>>> -            val = LLVMBuildInsertElement(
>>> -               builder, val,
>>> -               LLVMConstNull(LLVMFloatTypeInContext(bld->gallivm->context)),
>>> -               ii, "");
>>> -            LLVMBuildStore(builder, val, temp_ptr);
>>> -         }
>>> -         lp_build_else(&if_ctx);
>>> -         {
>>> -            LLVMValueRef val = LLVMBuildLoad(builder, temp_ptr, "");
>>> -
>>> -            scalar_ptr = LLVMBuildGEP(builder, base_ptr,
>>> -                                      &index, 1, "gather_ptr");
>>> -            scalar = LLVMBuildLoad(builder, scalar_ptr, "");
>>> -
>>> -            val = LLVMBuildInsertElement(builder, val, scalar, ii, "");
>>>
>>> -            LLVMBuildStore(builder, val, temp_ptr);
>>> -         }
>>> -         lp_build_endif(&if_ctx);
>>> -      } else {
>>> -         scalar_ptr = LLVMBuildGEP(builder, base_ptr,
>>> -                                   &index, 1, "gather_ptr");
>>> -         scalar = LLVMBuildLoad(builder, scalar_ptr, "");
>>> +      scalar_ptr = LLVMBuildGEP(builder, base_ptr,
>>> +                                &index, 1, "gather_ptr");
>>> +      scalar = LLVMBuildLoad(builder, scalar_ptr, "");
>>>
>>> -         res = LLVMBuildInsertElement(builder, res, scalar, ii, "");
>>> -      }
>>> +      res = LLVMBuildInsertElement(builder, res, scalar, ii, "");
>>>     }
>>>
>>>     if (overflow_mask) {
>>> -      res = LLVMBuildLoad(builder, temp_ptr, "gather_val");
>>> +      LLVMValueRef zeros = lp_build_const_vec(gallivm, bld->type, 0.0);
>>> +      res = lp_build_select(bld, overflow_mask, zeros, res);
>>>     }
>>>
>>>     return res;
>>> @@ -1247,17 +1230,15 @@ emit_fetch_constant(
>>>        num_consts = lp_build_broadcast_scalar(uint_bld, num_consts);
>>>        /* Construct a boolean vector telling us which channels
>>>         * overflow the bound constant buffer */
>>> -      overflow_mask = LLVMBuildICmp(builder, LLVMIntUGE,
>>> -                                    indirect_index,
>>> -                                    num_consts, "");
>>> +      overflow_mask = lp_build_compare(gallivm, uint_bld->type, PIPE_FUNC_GEQUAL,
>>> +                                       indirect_index, num_consts);
>>>
>>>        /* index_vec = indirect_index * 4 + swizzle */
>>>        index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2);
>>>        index_vec = lp_build_add(uint_bld, index_vec, swizzle_vec);
>>>
>>>        /* Gather values from the constant buffer */
>>> -      res = build_gather(&bld_base->base, consts_ptr, index_vec,
>>> -                         &overflow_mask);
>>> +      res = build_gather(bld_base, consts_ptr, index_vec, overflow_mask);
>>>     }
>>>     else {
>>>        LLVMValueRef index;  /* index into the const buffer */
>>> @@ -1319,7 +1300,7 @@ emit_fetch_immediate(
>>>                                             FALSE);
>>>
>>>           /* Gather values from the immediate register array */
>>> -         res = build_gather(&bld_base->base, imms_array, index_vec, NULL);
>>> +         res = build_gather(bld_base, imms_array, index_vec, NULL);
>>>        } else {
>>>           LLVMValueRef lindex = lp_build_const_int32(gallivm,
>>>                                          reg->Register.Index * 4 + swizzle);
>>> @@ -1373,7 +1354,7 @@ emit_fetch_input(
>>>        inputs_array = LLVMBuildBitCast(builder, bld->inputs_array, fptr_type, "");
>>>
>>>        /* Gather values from the input register array */
>>> -      res = build_gather(&bld_base->base, inputs_array, index_vec, NULL);
>>> +      res = build_gather(bld_base, inputs_array, index_vec, NULL);
>>>     } else {
>>>        if (bld->indirect_files & (1 << TGSI_FILE_INPUT)) {
>>>           LLVMValueRef lindex = lp_build_const_int32(gallivm,
>>> @@ -1495,7 +1476,7 @@ emit_fetch_temporary(
>>>        temps_array = LLVMBuildBitCast(builder, bld->temps_array, fptr_type, "");
>>>
>>>        /* Gather values from the temporary register array */
>>> -      res = build_gather(&bld_base->base, temps_array, index_vec, NULL);
>>> +      res = build_gather(bld_base, temps_array, index_vec, NULL);
>>>     }
>>>     else {
>>>        LLVMValueRef temp_ptr;
>>> diff --git a/src/gallium/drivers/llvmpipe/lp_scene.h b/src/gallium/drivers/llvmpipe/lp_scene.h
>>> index ad23c20..3ef0f32 100644
>>> --- a/src/gallium/drivers/llvmpipe/lp_scene.h
>>> +++ b/src/gallium/drivers/llvmpipe/lp_scene.h
>>> @@ -178,6 +178,8 @@ struct lp_scene {
>>>
>>>     struct cmd_bin tile[TILES_X][TILES_Y];
>>>     struct data_block_list data;
>>> +
>>> +   float fake_const_buf[4];
>>>  };
>>>
>>>
>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> index 3b0056c..bfe0af4 100644
>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>> @@ -1103,14 +1103,16 @@ try_update_scene_state( struct lp_setup_context *setup )
>>>                 setup->constants[i].stored_size = current_size;
>>>                 setup->constants[i].stored_data = stored;
>>>              }
>>> +            setup->fs.current.jit_context.constants[i] =
>>> +               setup->constants[i].stored_data;
>>>           }
>>>           else {
>>>              setup->constants[i].stored_size = 0;
>>>              setup->constants[i].stored_data = NULL;
>>> +            setup->fs.current.jit_context.constants[i] =
>>> +               scene->fake_const_buf;
>>>           }
>>>
>>> -         setup->fs.current.jit_context.constants[i] =
>>> -            setup->constants[i].stored_data;
>>>           num_constants =
>>>              setup->constants[i].stored_size / (sizeof(float) * 4);
>>>           setup->fs.current.jit_context.num_constants[i] = num_constants;
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=21UUTlirpPduK13RgVhZq1BbKFfCTxnqEp4ttEsgnUc&s=rN2Jgk0JlqBME5hFIhuKw37SCuHRd80IBgsdqPOubQQ&e=
>


More information about the mesa-dev mailing list