[Mesa-dev] [PATCH 30/53] r600: create LDS info constants buffer and write LDS registers.

Mon Nov 30 15:38:25 PST 2015

On Mon, Nov 30, 2015 at 1:30 PM, Marek Olšák <maraeo at gmail.com> wrote:
> On Mon, Nov 30, 2015 at 7:20 AM, Dave Airlie <airlied at gmail.com> wrote:
>> From: Dave Airlie <airlied at redhat.com>
>>
>> This creates a constant buffer with the information about
>> the layout of the LDS memory that is given to the vertex, tess
>> control and tess evaluation shaders.
>>
>> This also programs the LDS size and the LS_HS_CONFIG registers,
>> on evergreen only.
>>
>> Signed-off-by: Dave Airlie <airlied at redhat.com>
>> ---
>>  src/gallium/drivers/r600/evergreen_state.c   | 128 +++++++++++++++++++++++++++
>>  src/gallium/drivers/r600/r600_pipe.h         |  24 ++++-
>>  src/gallium/drivers/r600/r600_state_common.c |  13 +++
>>  3 files changed, 162 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c
>> index c01e8e3..edc6f28 100644
>> --- a/src/gallium/drivers/r600/evergreen_state.c
>> +++ b/src/gallium/drivers/r600/evergreen_state.c
>> @@ -3763,3 +3763,131 @@ void evergreen_init_state_functions(struct r600_context *rctx)
>>
>>         evergreen_init_compute_state_functions(rctx);
>>  }
>> +
>> +/**
>> + * This calculates the LDS size for tessellation shaders (VS, TCS, TES).
>> + *
>> + * The information about LDS and other non-compile-time parameters is then
>> + * written to the const buffer.
>> +
>> + * const buffer contains -
>> + * uint32_t input_patch_size
>> + * uint32_t input_vertex_size
>> + * uint32_t num_tcs_input_cp
>> + * uint32_t num_tcs_output_cp;
>> + * uint32_t output_patch_size
>> + * uint32_t output_vertex_size
>> + * uint32_t output_patch0_offset
>> + * uint32_t perpatch_output_offset
>> + * and the same constbuf is bound to LS/HS/VS(ES).
>> + */
>> +void evergreen_setup_tess_constants(struct r600_context *rctx, const struct pipe_draw_info *info, unsigned *num_patches, uint32_t *lds_alloc)
>> +{
>> +       struct pipe_constant_buffer constbuf = {0};
>> +       struct r600_pipe_shader_selector *tcs = rctx->tcs_shader ? rctx->tcs_shader : rctx->tes_shader;
>> +       struct r600_pipe_shader_selector *ls = rctx->vs_shader;
>> +       unsigned num_tcs_input_cp = info->vertices_per_patch;
>> +       unsigned num_tcs_outputs;
>> +       unsigned num_tcs_output_cp;
>> +       unsigned num_tcs_patch_outputs;
>> +       unsigned num_tcs_inputs;
>> +       unsigned input_vertex_size, output_vertex_size;
>> +       unsigned input_patch_size, pervertex_output_patch_size, output_patch_size;
>> +       unsigned output_patch0_offset, perpatch_output_offset, lds_size;
>> +       uint32_t values[16];
>> +       uint32_t tmp;
>> +
>> +       if (!rctx->tes_shader)
>> +               return;
>> +
>> +       *num_patches = 1;
>
> num_patches should be set before returning.
>
>> +
>> +       num_tcs_inputs = util_last_bit64(ls->lds_outputs_written_mask);
>> +
>> +       if (rctx->tcs_shader) {
>> +               num_tcs_outputs = util_last_bit64(tcs->lds_outputs_written_mask);
>> +               num_tcs_output_cp = tcs->info.properties[TGSI_PROPERTY_TCS_VERTICES_OUT];
>> +               num_tcs_patch_outputs = util_last_bit64(tcs->lds_patch_outputs_written_mask);
>> +       } else {
>> +               num_tcs_outputs = num_tcs_inputs;
>> +               num_tcs_output_cp = num_tcs_input_cp;
>> +               num_tcs_patch_outputs = 2; /* TESSINNER + TESSOUTER */
>> +       }
>> +
>> +       /* size in bytes */
>> +       input_vertex_size = num_tcs_inputs * 16;
>> +       output_vertex_size = num_tcs_outputs * 16;
>> +
>> +       input_patch_size = num_tcs_input_cp * input_vertex_size;
>> +
>> +       pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
>> +       output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs * 16;
>> +
>> +       output_patch0_offset = rctx->tcs_shader ? input_patch_size * *num_patches : 0;
>> +       perpatch_output_offset = output_patch0_offset + pervertex_output_patch_size;
>> +
>> +       lds_size = output_patch0_offset + output_patch_size * *num_patches;
>> +
>> +       values[0] = input_patch_size;
>> +       values[1] = input_vertex_size;
>> +       values[2] = num_tcs_input_cp;
>> +       values[3] = num_tcs_output_cp;
>> +
>> +       values[4] = output_patch_size;
>> +       values[5] = output_vertex_size;
>> +       values[6] = output_patch0_offset;
>> +       values[7] = perpatch_output_offset;
>> +
>> +       /* docs say HS_NUM_WAVES - CEIL((LS_HS_CONFIG.NUM_PATCHES *
>> +          LS_HS_CONFIG.HS_NUM_OUTPUT_CP) / (NUM_GOOD_PIPES * 16)) */
>> +       tmp = (lds_size | (1 << 14)); /* TODO */
>
> If I understand this correctly, num_good_pipes can be between 1 and 4.
> Assume the worst case, which is 1. This gives us:
> ceil(NUM_PATCHES * NUM_OUTPUT_CP / 16)
>
> That equals 2 if NUM_OUTPUT_CP > 16 and NUM_PATCHES = 1.

BTW, HS_NUM_WAVES means how many waves share the same LDS memory.
1 pipe = 16 threads per wave, (GCN always has 4 pipes = 64 threads per
wave). That's where "16" in the equation comes from. The equation only
ensures that all vertices within a patch are assigned the same LDS
memory. (that's why you need at least 2 for 1-pipe chips and
NUM_OUTPUT_CP > 16)

Marek