[Mesa-dev] [PATCH 2/2] radv: Implement VK_AMD_shader_info

Thu Oct 26 16:11:53 UTC 2017

On 10/25/2017 10:58 PM, Bas Nieuwenhuizen wrote:
> On Wed, Oct 25, 2017 at 4:03 PM, Samuel Pitoiset
> <samuel.pitoiset at gmail.com> wrote:
>>
>>
>> On 10/25/2017 02:20 PM, Alex Smith wrote:
>>>
>>> On 25 October 2017 at 12:46, Samuel Pitoiset <samuel.pitoiset at gmail.com
>>> <mailto:samuel.pitoiset at gmail.com>> wrote:
>>>
>>>      I have something similar on my local tree (started on monday).
>>>
>>>      Though, I don't like the way we expose the number of VGPRS/SGPRS
>>>      because we can't really figure out the number of spilled ones.
>>>
>>>
>>> My assumption was that if we've spilled then we've used all available
>>> registers, so if numUsed{V,S}gprs is greater than the number available, then
>>> you'd know that the number spilled is the difference between the two. Can we
>>> have spilling when num_{v,s}gprs is less than the number available?
>>
>>
>> Assuming the number of waves per CU is 4, I would go with:
>>
>> num_available_vgprs = num_physical_vgprs (ie. 256) / max_simd_waves (aligned
>> down to 4).
> 
> for compute there is
> 
> num_available_vgprs (as LLVM sees as constraints) = num_physical_vgprs
> / ceil(compute_workgroup_size / 256)
> 
> for other stages it always is 256. (Until we implement the wave limit ext)
> 
> Reading from the spec I think it is unintuitive that the usedVgpr
> stats include spilled registers though. I'd
> expect to see just the physically used regs. Is this something that
> Feral has tried on the official driver on any platform? I'd say to not
> include the spilled regs (you can get it approximately with scratch
> memory / 256), unless the official driver does otherwise, in which
> case we should go for consistency.

Yeah, for the number of spilled VGPRS we can get it from the scratch 
memory size, but not for the spilled SGPRS. I think it would have been 
better to add fields for the spilled VGPRS/SGPRS.

> 
>>
>> (or we can just set num_available_vgprs to conf->num_vgprs and return
>> num_used_vgprs = conf->num_vgprs + conf->num_spilled_sgprs).
>>
>> That way, if num_used_vgprs is greater than num_available_vgprs we know that
>> we are spilling some vgprs.
>>
>> For the number of available SGPRs, I think we can just hardcode the value to
>> 104 for now.
>>
>> Also with this, we can easily re-compute the maximum number of waves.
>>
>>>
>>> Alex
>>>
>>>
>>>
>>>      On 10/25/2017 01:18 PM, Alex Smith wrote:
>>>
>>>          This allows an app to query shader statistics and get a
>>>          disassembly of
>>>          a shader. RenderDoc git has support for it, so this allows you
>>>          to view
>>>          shader disassembly from a capture.
>>>
>>>          When this extension is enabled on a device (or when tracing), we
>>> now
>>>          disable pipeline caching, since we don't get the shader debug
>>>          info when
>>>          we retrieve cached shaders.
>>>
>>>          Signed-off-by: Alex Smith <asmith at feralinteractive.com
>>>          <mailto:asmith at feralinteractive.com>>
>>>
>>>          ---
>>>             src/amd/vulkan/radv_device.c         |   9 ++
>>>             src/amd/vulkan/radv_extensions.py    |   1 +
>>>             src/amd/vulkan/radv_pipeline.c       |   2 +-
>>>             src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
>>>             src/amd/vulkan/radv_private.h        |   3 +
>>>             src/amd/vulkan/radv_shader.c         | 163
>>>          ++++++++++++++++++++++++++++-------
>>>             6 files changed, 154 insertions(+), 35 deletions(-)
>>>
>>>          diff --git a/src/amd/vulkan/radv_device.c
>>>          b/src/amd/vulkan/radv_device.c
>>>          index c4e25222ea..5603551680 100644
>>>          --- a/src/amd/vulkan/radv_device.c
>>>          +++ b/src/amd/vulkan/radv_device.c
>>>          @@ -943,10 +943,15 @@ VkResult radv_CreateDevice(
>>>                   VkResult result;
>>>                   struct radv_device *device;
>>>             +     bool keep_shader_info = false;
>>>          +
>>>                   for (uint32_t i = 0; i <
>>>          pCreateInfo->enabledExtensionCount; i++) {
>>>                           const char *ext_name =
>>>          pCreateInfo->ppEnabledExtensionNames[i];
>>>                           if
>>>          (!radv_physical_device_extension_supported(physical_device,
>>>          ext_name))
>>>                                   return
>>>          vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
>>>          +
>>>          +               if (strcmp(ext_name,
>>>          VK_AMD_SHADER_INFO_EXTENSION_NAME) == 0)
>>>          +                       keep_shader_info = true;
>>>                   }
>>>                   /* Check enabled features */
>>>          @@ -1040,10 +1045,14 @@ VkResult radv_CreateDevice(
>>>                           device->physical_device->rad_info.max_se >= 2;
>>>                   if (getenv("RADV_TRACE_FILE")) {
>>>          +               keep_shader_info = true;
>>>          +
>>>                           if (!radv_init_trace(device))
>>>                                   goto fail;
>>>                   }
>>>             +     device->keep_shader_info = keep_shader_info;
>>>          +
>>>                   result = radv_device_init_meta(device);
>>>                   if (result != VK_SUCCESS)
>>>                           goto fail;
>>>          diff --git a/src/amd/vulkan/radv_extensions.py
>>>          b/src/amd/vulkan/radv_extensions.py
>>>          index dfeb2880fc..eeb679d65a 100644
>>>          --- a/src/amd/vulkan/radv_extensions.py
>>>          +++ b/src/amd/vulkan/radv_extensions.py
>>>          @@ -81,6 +81,7 @@ EXTENSIONS = [
>>>                 Extension('VK_EXT_global_priority',                   1,
>>>          'device->rad_info.has_ctx_priority'),
>>>                 Extension('VK_AMD_draw_indirect_count',               1,
>>>          True),
>>>                 Extension('VK_AMD_rasterization_order',               1,
>>>          'device->rad_info.chip_class >= VI && device->rad_info.max_se >=
>>>          2'),
>>>          +    Extension('VK_AMD_shader_info',                       1,
>>> True),
>>>             ]
>>>               class VkVersion:
>>>          diff --git a/src/amd/vulkan/radv_pipeline.c
>>>          b/src/amd/vulkan/radv_pipeline.c
>>>          index d6b33a5327..2df03a83cf 100644
>>>          --- a/src/amd/vulkan/radv_pipeline.c
>>>          +++ b/src/amd/vulkan/radv_pipeline.c
>>>          @@ -1874,7 +1874,7 @@ void radv_create_shaders(struct
>>>          radv_pipeline *pipeline,
>>>                                   if (device->instance->debug_flags &
>>>          RADV_DEBUG_DUMP_SHADERS)
>>>                                           nir_print_shader(nir[i], stderr);
>>>             -                     if (!pipeline->device->trace_bo)
>>>          +                       if (!pipeline->device->keep_shader_info)
>>>                                           ralloc_free(nir[i]);
>>>                           }
>>>                   }
>>>          diff --git a/src/amd/vulkan/radv_pipeline_cache.c
>>>          b/src/amd/vulkan/radv_pipeline_cache.c
>>>          index 9ba9a3b61b..46198799a7 100644
>>>          --- a/src/amd/vulkan/radv_pipeline_cache.c
>>>          +++ b/src/amd/vulkan/radv_pipeline_cache.c
>>>          @@ -62,9 +62,11 @@ radv_pipeline_cache_init(struct
>>>          radv_pipeline_cache *cache,
>>>                   cache->hash_table = malloc(byte_size);
>>>                   /* We don't consider allocation failure fatal, we just
>>>          start with a 0-sized
>>>          -        * cache. */
>>>          +        * cache. Disable caching when we want to keep shader
>>>          debug info, since
>>>          +        * we don't get the debug info on cached shaders. */
>>>                   if (cache->hash_table == NULL ||
>>>          -           (device->instance->debug_flags & RADV_DEBUG_NO_CACHE))
>>>          +           (device->instance->debug_flags & RADV_DEBUG_NO_CACHE)
>>> ||
>>>          +           device->keep_shader_info)
>>>                           cache->table_size = 0;
>>>                   else
>>>                           memset(cache->hash_table, 0, byte_size);
>>>          @@ -186,8 +188,11 @@
>>>          radv_create_shader_variants_from_pipeline_cache(struct
>>>          radv_device *device,
>>>                   entry = radv_pipeline_cache_search_unlocked(cache, sha1);
>>>                   if (!entry) {
>>>          +               /* Again, don't cache when we want debug info,
>>>          since this isn't
>>>          +                * present in the cache. */
>>>                           if (!device->physical_device->disk_cache ||
>>>          -                   (device->instance->debug_flags &
>>>          RADV_DEBUG_NO_CACHE)) {
>>>          +                   (device->instance->debug_flags &
>>>          RADV_DEBUG_NO_CACHE) ||
>>>          +                   device->keep_shader_info) {
>>>                                   pthread_mutex_unlock(&cache->mutex);
>>>                                   return false;
>>>                           }
>>>          diff --git a/src/amd/vulkan/radv_private.h
>>>          b/src/amd/vulkan/radv_private.h
>>>          index a4e52b2530..169df5f37b 100644
>>>          --- a/src/amd/vulkan/radv_private.h
>>>          +++ b/src/amd/vulkan/radv_private.h
>>>          @@ -552,6 +552,9 @@ struct radv_device {
>>>                   struct radeon_winsys_bo                      *trace_bo;
>>>                   uint32_t
>>> *trace_id_ptr;
>>>             +     /* Whether to keep shader debug info, for tracing or
>>>          VK_AMD_shader_info */
>>>          +       bool
>>> keep_shader_info;
>>>          +
>>>                   struct radv_physical_device
>>> *physical_device;
>>>                   /* Backup in-memory cache to be used if the app doesn't
>>>          provide one */
>>>          diff --git a/src/amd/vulkan/radv_shader.c
>>>          b/src/amd/vulkan/radv_shader.c
>>>          index 5903917068..7f2f0fd750 100644
>>>          --- a/src/amd/vulkan/radv_shader.c
>>>          +++ b/src/amd/vulkan/radv_shader.c
>>>          @@ -46,6 +46,8 @@
>>>             #include "util/debug.h"
>>>             #include "ac_exp_param.h"
>>>             +#include "util/string_buffer.h"
>>>          +
>>>             static const struct nir_shader_compiler_options nir_options = {
>>>                   .vertex_id_zero_based = true,
>>>                   .lower_scmp = true,
>>>          @@ -471,7 +473,7 @@ shader_variant_create(struct radv_device
>>>          *device,
>>>                   free(binary.relocs);
>>>                   variant->ref_count = 1;
>>>             -     if (device->trace_bo) {
>>>          +       if (device->keep_shader_info) {
>>>                           variant->disasm_string = binary.disasm_string;
>>>                           if (!gs_copy_shader && !module->nir) {
>>>                                   variant->nir = *shaders;
>>>          @@ -593,11 +595,20 @@ radv_get_shader_name(struct
>>>          radv_shader_variant *var, gl_shader_stage stage)
>>>                   };
>>>             }
>>>             -void
>>>          -radv_shader_dump_stats(struct radv_device *device,
>>>          -                      struct radv_shader_variant *variant,
>>>          -                      gl_shader_stage stage,
>>>          -                      FILE *file)
>>>          +static uint32_t
>>>          +get_total_sgprs(struct radv_device *device)
>>>          +{
>>>          +       if (device->physical_device->rad_info.chip_class >= VI)
>>>          +               return 800;
>>>          +       else
>>>          +               return 512;
>>>          +}
>>>          +
>>>          +static void
>>>          +generate_shader_stats(struct radv_device *device,
>>>          +                     struct radv_shader_variant *variant,
>>>          +                     gl_shader_stage stage,
>>>          +                     struct _mesa_string_buffer *buf)
>>>             {
>>>                   unsigned lds_increment =
>>>          device->physical_device->rad_info.chip_class >= CIK ? 512 : 256;
>>>                   struct ac_shader_config *conf;
>>>          @@ -623,12 +634,8 @@ radv_shader_dump_stats(struct radv_device
>>>          *device,
>>>                                                lds_increment);
>>>                   }
>>>             -     if (conf->num_sgprs) {
>>>          -               if (device->physical_device->rad_info.chip_class
>>>           >= VI)
>>>          -                       max_simd_waves = MIN2(max_simd_waves,
>>>          800 / conf->num_sgprs);
>>>          -               else
>>>          -                       max_simd_waves = MIN2(max_simd_waves,
>>>          512 / conf->num_sgprs);
>>>          -       }
>>>          +       if (conf->num_sgprs)
>>>          +               max_simd_waves = MIN2(max_simd_waves,
>>>          get_total_sgprs(device) / conf->num_sgprs);
>>>                   if (conf->num_vgprs)
>>>                           max_simd_waves = MIN2(max_simd_waves, 256 /
>>>          conf->num_vgprs);
>>>          @@ -639,27 +646,121 @@ radv_shader_dump_stats(struct radv_device
>>>          *device,
>>>                   if (lds_per_wave)
>>>                           max_simd_waves = MIN2(max_simd_waves, 16384 /
>>>          lds_per_wave);
>>>             +     if (stage == MESA_SHADER_FRAGMENT) {
>>>          +               _mesa_string_buffer_printf(buf, "*** SHADER
>>>          CONFIG ***\n"
>>>          +                                          "SPI_PS_INPUT_ADDR =
>>>          0x%04x\n"
>>>          +                                          "SPI_PS_INPUT_ENA  =
>>>          0x%04x\n",
>>>          +
>>> conf->spi_ps_input_addr, conf->spi_ps_input_ena);
>>>          +       }
>>>          +
>>>          +       _mesa_string_buffer_printf(buf, "*** SHADER STATS ***\n"
>>>          +                                  "SGPRS: %d\n"
>>>          +                                  "VGPRS: %d\n"
>>>          +                                  "Spilled SGPRs: %d\n"
>>>          +                                  "Spilled VGPRs: %d\n"
>>>          +                                  "Code Size: %d bytes\n"
>>>          +                                  "LDS: %d blocks\n"
>>>          +                                  "Scratch: %d bytes per wave\n"
>>>          +                                  "Max Waves: %d\n"
>>>          +                                  "********************\n\n\n",
>>>          +                                  conf->num_sgprs,
>>> conf->num_vgprs,
>>>          +                                  conf->spilled_sgprs,
>>>          conf->spilled_vgprs, variant->code_size,
>>>          +                                  conf->lds_size,
>>>          conf->scratch_bytes_per_wave,
>>>          +                                  max_simd_waves);
>>>          +}
>>>          +
>>>          +void
>>>          +radv_shader_dump_stats(struct radv_device *device,
>>>          +                      struct radv_shader_variant *variant,
>>>          +                      gl_shader_stage stage,
>>>          +                      FILE *file)
>>>          +{
>>>          +       struct _mesa_string_buffer *buf =
>>>          _mesa_string_buffer_create(NULL, 256);
>>>          +
>>>          +       generate_shader_stats(device, variant, stage, buf);
>>>          +
>>>                   fprintf(file, "\n%s:\n", radv_get_shader_name(variant,
>>>          stage));
>>>          +       fprintf(file, buf->buf);
>>>             -     if (stage == MESA_SHADER_FRAGMENT) {
>>>          -               fprintf(file, "*** SHADER CONFIG ***\n"
>>>          -                       "SPI_PS_INPUT_ADDR = 0x%04x\n"
>>>          -                       "SPI_PS_INPUT_ENA  = 0x%04x\n",
>>>          -                       conf->spi_ps_input_addr,
>>>          conf->spi_ps_input_ena);
>>>          +       _mesa_string_buffer_destroy(buf);
>>>          +}
>>>          +
>>>          +VkResult
>>>          +radv_GetShaderInfoAMD(VkDevice _device,
>>>          +                     VkPipeline _pipeline,
>>>          +                     VkShaderStageFlagBits shaderStage,
>>>          +                     VkShaderInfoTypeAMD infoType,
>>>          +                     size_t* pInfoSize,
>>>          +                     void* pInfo)
>>>          +{
>>>          +       RADV_FROM_HANDLE(radv_device, device, _device);
>>>          +       RADV_FROM_HANDLE(radv_pipeline, pipeline, _pipeline);
>>>          +       gl_shader_stage stage =
>>>          vk_to_mesa_shader_stage(shaderStage);
>>>          +       struct radv_shader_variant *variant =
>>>          pipeline->shaders[stage];
>>>          +       struct _mesa_string_buffer *buf;
>>>          +       VkResult result = VK_SUCCESS;
>>>          +
>>>          +       /* Spec doesn't indicate what to do if the stage is
>>>          invalid, so just
>>>          +        * return no info for this. */
>>>          +       if (!variant)
>>>          +               return VK_ERROR_FEATURE_NOT_PRESENT;
>>>          +
>>>          +       switch (infoType) {
>>>          +       case VK_SHADER_INFO_TYPE_STATISTICS_AMD:
>>>          +               if (!pInfo) {
>>>          +                       *pInfoSize =
>>>          sizeof(VkShaderStatisticsInfoAMD);
>>>          +               } else {
>>>          +                       struct ac_shader_config *conf =
>>>          &variant->config;
>>>          +
>>>          +                       VkShaderStatisticsInfoAMD statistics = {};
>>>          +                       statistics.shaderStageMask = shaderStage;
>>>          +                       statistics.resourceUsage.numUsedVgprs =
>>>          conf->num_vgprs + conf->spilled_vgprs;
>>>          +                       statistics.resourceUsage.numUsedSgprs =
>>>          conf->num_sgprs + conf->spilled_sgprs;
>>>          +
>>> statistics.resourceUsage.ldsSizePerLocalWorkGroup = 16384;
>>>          +
>>> statistics.resourceUsage.ldsUsageSizeInBytes = conf->lds_size;
>>>          +
>>> statistics.resourceUsage.scratchMemUsageInBytes =
>>>          conf->scratch_bytes_per_wave;
>>>          +                       statistics.numPhysicalVgprs =
>>>          statistics.numAvailableVgprs = 256;
>>>          +                       statistics.numPhysicalSgprs =
>>>          statistics.numAvailableSgprs = get_total_sgprs(device);
>>>          +                       statistics.computeWorkGroupSize[0] =
>>>          variant->nir->info.cs.local_size[0];
>>>          +                       statistics.computeWorkGroupSize[1] =
>>>          variant->nir->info.cs.local_size[1];
>>>          +                       statistics.computeWorkGroupSize[2] =
>>>          variant->nir->info.cs.local_size[2];
>>>          +
>>>          +                       size_t size = *pInfoSize;
>>>          +                       *pInfoSize = sizeof(statistics);
>>>          +
>>>          +                       memcpy(pInfo, &statistics, MIN2(size,
>>>          *pInfoSize));
>>>          +
>>>          +                       if (size < *pInfoSize)
>>>          +                               result = VK_INCOMPLETE;
>>>          +               }
>>>          +
>>>          +               break;
>>>          +       case VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD:
>>>          +               buf = _mesa_string_buffer_create(NULL, 1024);
>>>          +
>>>          +               _mesa_string_buffer_printf(buf, "%s:\n",
>>>          radv_get_shader_name(variant, stage));
>>>          +               _mesa_string_buffer_printf(buf, "%s\n\n",
>>>          variant->disasm_string);
>>>          +               generate_shader_stats(device, variant, stage,
>>> buf);
>>>          +
>>>          +               if (!pInfo) {
>>>          +                       *pInfoSize = buf->length;
>>>          +               } else {
>>>          +                       size_t size = *pInfoSize;
>>>          +                       *pInfoSize = buf->length;
>>>          +
>>>          +                       memcpy(pInfo, buf->buf, MIN2(size,
>>>          buf->length));
>>>          +
>>>          +                       if (size < buf->length)
>>>          +                               result = VK_INCOMPLETE;
>>>          +               }
>>>          +
>>>          +               _mesa_string_buffer_destroy(buf);
>>>          +               break;
>>>          +       default:
>>>          +               /* VK_SHADER_INFO_TYPE_BINARY_AMD unimplemented
>>>          for now. */
>>>          +               result = VK_ERROR_FEATURE_NOT_PRESENT;
>>>          +               break;
>>>                   }
>>>             -     fprintf(file, "*** SHADER STATS ***\n"
>>>          -               "SGPRS: %d\n"
>>>          -               "VGPRS: %d\n"
>>>          -               "Spilled SGPRs: %d\n"
>>>          -               "Spilled VGPRs: %d\n"
>>>          -               "Code Size: %d bytes\n"
>>>          -               "LDS: %d blocks\n"
>>>          -               "Scratch: %d bytes per wave\n"
>>>          -               "Max Waves: %d\n"
>>>          -               "********************\n\n\n",
>>>          -               conf->num_sgprs, conf->num_vgprs,
>>>          -               conf->spilled_sgprs, conf->spilled_vgprs,
>>>          variant->code_size,
>>>          -               conf->lds_size, conf->scratch_bytes_per_wave,
>>>          -               max_simd_waves);
>>>          +       return result;
>>>             }
>>>
>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev