<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 25 October 2017 at 21:58, Bas Nieuwenhuizen <span dir="ltr"><<a href="mailto:bas@basnieuwenhuizen.nl" target="_blank">bas@basnieuwenhuizen.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, Oct 25, 2017 at 4:03 PM, Samuel Pitoiset<br>
<<a href="mailto:samuel.pitoiset@gmail.com">samuel.pitoiset@gmail.com</a>> wrote:<br>
><br>
><br>
> On 10/25/2017 02:20 PM, Alex Smith wrote:<br>
>><br>
>> On 25 October 2017 at 12:46, Samuel Pitoiset <<a href="mailto:samuel.pitoiset@gmail.com">samuel.pitoiset@gmail.com</a><br>
>> <mailto:<a href="mailto:samuel.pitoiset@gmail.com">samuel.pitoiset@gmail.<wbr>com</a>>> wrote:<br>
>><br>
>> I have something similar on my local tree (started on monday).<br>
>><br>
>> Though, I don't like the way we expose the number of VGPRS/SGPRS<br>
>> because we can't really figure out the number of spilled ones.<br>
>><br>
>><br>
>> My assumption was that if we've spilled then we've used all available<br>
>> registers, so if numUsed{V,S}gprs is greater than the number available, then<br>
>> you'd know that the number spilled is the difference between the two. Can we<br>
>> have spilling when num_{v,s}gprs is less than the number available?<br>
><br>
><br>
> Assuming the number of waves per CU is 4, I would go with:<br>
><br>
> num_available_vgprs = num_physical_vgprs (ie. 256) / max_simd_waves (aligned<br>
> down to 4).<br>
<br>
</span>for compute there is<br>
<br>
num_available_vgprs (as LLVM sees as constraints) = num_physical_vgprs<br>
/ ceil(compute_workgroup_size / 256)<br>
<br>
for other stages it always is 256. (Until we implement the wave limit ext)<br>
<br>
Reading from the spec I think it is unintuitive that the usedVgpr<br>
stats include spilled registers though. I'd<br>
expect to see just the physically used regs. Is this something that<br>
Feral has tried on the official driver on any platform? I'd say to not<br>
include the spilled regs (you can get it approximately with scratch<br>
memory / 256), unless the official driver does otherwise, in which<br>
case we should go for consistency.<br></blockquote><div><br></div><div>I've not looked at amdgpu-pro, I'm unable to check it right now. Not sure if that would even have the extension since it only appeared in the spec very recently.</div><div><br></div><div>I'll go with what you suggest for now, I think you're probably right that we shouldn't include the spilled registers.</div><div><br></div><div>Alex</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div class="h5"><br>
><br>
> (or we can just set num_available_vgprs to conf->num_vgprs and return<br>
> num_used_vgprs = conf->num_vgprs + conf->num_spilled_sgprs).<br>
><br>
> That way, if num_used_vgprs is greater than num_available_vgprs we know that<br>
> we are spilling some vgprs.<br>
><br>
> For the number of available SGPRs, I think we can just hardcode the value to<br>
> 104 for now.<br>
><br>
> Also with this, we can easily re-compute the maximum number of waves.<br>
><br>
>><br>
>> Alex<br>
>><br>
>><br>
>><br>
>> On 10/25/2017 01:18 PM, Alex Smith wrote:<br>
>><br>
>> This allows an app to query shader statistics and get a<br>
>> disassembly of<br>
>> a shader. RenderDoc git has support for it, so this allows you<br>
>> to view<br>
>> shader disassembly from a capture.<br>
>><br>
>> When this extension is enabled on a device (or when tracing), we<br>
>> now<br>
>> disable pipeline caching, since we don't get the shader debug<br>
>> info when<br>
>> we retrieve cached shaders.<br>
>><br>
>> Signed-off-by: Alex Smith <<a href="mailto:asmith@feralinteractive.com">asmith@feralinteractive.com</a><br>
>> <mailto:<a href="mailto:asmith@feralinteractive.com">asmith@<wbr>feralinteractive.com</a>>><br>
>><br>
>> ---<br>
>> src/amd/vulkan/radv_device.c | 9 ++<br>
>> src/amd/vulkan/radv_<wbr>extensions.py | 1 +<br>
>> src/amd/vulkan/radv_pipeline.c | 2 +-<br>
>> src/amd/vulkan/radv_pipeline_<wbr>cache.c | 11 ++-<br>
>> src/amd/vulkan/radv_private.h | 3 +<br>
>> src/amd/vulkan/radv_shader.c | 163<br>
>> ++++++++++++++++++++++++++++--<wbr>-----<br>
>> 6 files changed, 154 insertions(+), 35 deletions(-)<br>
>><br>
>> diff --git a/src/amd/vulkan/radv_device.c<br>
>> b/src/amd/vulkan/radv_device.c<br>
>> index c4e25222ea..5603551680 100644<br>
>> --- a/src/amd/vulkan/radv_device.c<br>
>> +++ b/src/amd/vulkan/radv_device.c<br>
>> @@ -943,10 +943,15 @@ VkResult radv_CreateDevice(<br>
>> VkResult result;<br>
>> struct radv_device *device;<br>
>> + bool keep_shader_info = false;<br>
>> +<br>
>> for (uint32_t i = 0; i <<br>
>> pCreateInfo-><wbr>enabledExtensionCount; i++) {<br>
>> const char *ext_name =<br>
>> pCreateInfo-><wbr>ppEnabledExtensionNames[i];<br>
>> if<br>
>> (!radv_physical_device_<wbr>extension_supported(physical_<wbr>device,<br>
>> ext_name))<br>
>> return<br>
>> vk_error(VK_ERROR_EXTENSION_<wbr>NOT_PRESENT);<br>
>> +<br>
>> + if (strcmp(ext_name,<br>
>> VK_AMD_SHADER_INFO_EXTENSION_<wbr>NAME) == 0)<br>
>> + keep_shader_info = true;<br>
>> }<br>
>> /* Check enabled features */<br>
>> @@ -1040,10 +1045,14 @@ VkResult radv_CreateDevice(<br>
>> device->physical_device->rad_<wbr>info.max_se >= 2;<br>
>> if (getenv("RADV_TRACE_FILE")) {<br>
>> + keep_shader_info = true;<br>
>> +<br>
>> if (!radv_init_trace(device))<br>
>> goto fail;<br>
>> }<br>
>> + device->keep_shader_info = keep_shader_info;<br>
>> +<br>
>> result = radv_device_init_meta(device);<br>
>> if (result != VK_SUCCESS)<br>
>> goto fail;<br>
>> diff --git a/src/amd/vulkan/radv_<wbr>extensions.py<br>
>> b/src/amd/vulkan/radv_<wbr>extensions.py<br>
>> index dfeb2880fc..eeb679d65a 100644<br>
>> --- a/src/amd/vulkan/radv_<wbr>extensions.py<br>
>> +++ b/src/amd/vulkan/radv_<wbr>extensions.py<br>
>> @@ -81,6 +81,7 @@ EXTENSIONS = [<br>
>> Extension('VK_EXT_global_<wbr>priority', 1,<br>
>> 'device->rad_info.has_ctx_<wbr>priority'),<br>
>> Extension('VK_AMD_draw_<wbr>indirect_count', 1,<br>
>> True),<br>
>> Extension('VK_AMD_<wbr>rasterization_order', 1,<br>
>> 'device->rad_info.chip_class >= VI && device->rad_info.max_se >=<br>
>> 2'),<br>
>> + Extension('VK_AMD_shader_info'<wbr>, 1,<br>
>> True),<br>
>> ]<br>
>> class VkVersion:<br>
>> diff --git a/src/amd/vulkan/radv_<wbr>pipeline.c<br>
>> b/src/amd/vulkan/radv_<wbr>pipeline.c<br>
>> index d6b33a5327..2df03a83cf 100644<br>
>> --- a/src/amd/vulkan/radv_<wbr>pipeline.c<br>
>> +++ b/src/amd/vulkan/radv_<wbr>pipeline.c<br>
>> @@ -1874,7 +1874,7 @@ void radv_create_shaders(struct<br>
>> radv_pipeline *pipeline,<br>
>> if (device->instance->debug_flags &<br>
>> RADV_DEBUG_DUMP_SHADERS)<br>
>> nir_print_shader(nir[i], stderr);<br>
>> - if (!pipeline->device->trace_bo)<br>
>> + if (!pipeline->device->keep_<wbr>shader_info)<br>
>> ralloc_free(nir[i]);<br>
>> }<br>
>> }<br>
>> diff --git a/src/amd/vulkan/radv_<wbr>pipeline_cache.c<br>
>> b/src/amd/vulkan/radv_<wbr>pipeline_cache.c<br>
>> index 9ba9a3b61b..46198799a7 100644<br>
>> --- a/src/amd/vulkan/radv_<wbr>pipeline_cache.c<br>
>> +++ b/src/amd/vulkan/radv_<wbr>pipeline_cache.c<br>
>> @@ -62,9 +62,11 @@ radv_pipeline_cache_init(<wbr>struct<br>
>> radv_pipeline_cache *cache,<br>
>> cache->hash_table = malloc(byte_size);<br>
>> /* We don't consider allocation failure fatal, we just<br>
>> start with a 0-sized<br>
>> - * cache. */<br>
>> + * cache. Disable caching when we want to keep shader<br>
>> debug info, since<br>
>> + * we don't get the debug info on cached shaders. */<br>
>> if (cache->hash_table == NULL ||<br>
>> - (device->instance->debug_flags & RADV_DEBUG_NO_CACHE))<br>
>> + (device->instance->debug_flags & RADV_DEBUG_NO_CACHE)<br>
>> ||<br>
>> + device->keep_shader_info)<br>
>> cache->table_size = 0;<br>
>> else<br>
>> memset(cache->hash_table, 0, byte_size);<br>
>> @@ -186,8 +188,11 @@<br>
>> radv_create_shader_variants_<wbr>from_pipeline_cache(struct<br>
>> radv_device *device,<br>
>> entry = radv_pipeline_cache_search_<wbr>unlocked(cache, sha1);<br>
>> if (!entry) {<br>
>> + /* Again, don't cache when we want debug info,<br>
>> since this isn't<br>
>> + * present in the cache. */<br>
>> if (!device->physical_device-><wbr>disk_cache ||<br>
>> - (device->instance->debug_flags &<br>
>> RADV_DEBUG_NO_CACHE)) {<br>
>> + (device->instance->debug_flags &<br>
>> RADV_DEBUG_NO_CACHE) ||<br>
>> + device->keep_shader_info) {<br>
>> pthread_mutex_unlock(&cache-><wbr>mutex);<br>
>> return false;<br>
>> }<br>
>> diff --git a/src/amd/vulkan/radv_private.<wbr>h<br>
>> b/src/amd/vulkan/radv_private.<wbr>h<br>
>> index a4e52b2530..169df5f37b 100644<br>
>> --- a/src/amd/vulkan/radv_private.<wbr>h<br>
>> +++ b/src/amd/vulkan/radv_private.<wbr>h<br>
>> @@ -552,6 +552,9 @@ struct radv_device {<br>
>> struct radeon_winsys_bo *trace_bo;<br>
>> uint32_t<br>
>> *trace_id_ptr;<br>
>> + /* Whether to keep shader debug info, for tracing or<br>
>> VK_AMD_shader_info */<br>
>> + bool<br>
>> keep_shader_info;<br>
>> +<br>
>> struct radv_physical_device<br>
>> *physical_device;<br>
>> /* Backup in-memory cache to be used if the app doesn't<br>
>> provide one */<br>
>> diff --git a/src/amd/vulkan/radv_shader.c<br>
>> b/src/amd/vulkan/radv_shader.c<br>
>> index 5903917068..7f2f0fd750 100644<br>
>> --- a/src/amd/vulkan/radv_shader.c<br>
>> +++ b/src/amd/vulkan/radv_shader.c<br>
>> @@ -46,6 +46,8 @@<br>
>> #include "util/debug.h"<br>
>> #include "ac_exp_param.h"<br>
>> +#include "util/string_buffer.h"<br>
>> +<br>
>> static const struct nir_shader_compiler_options nir_options = {<br>
>> .vertex_id_zero_based = true,<br>
>> .lower_scmp = true,<br>
>> @@ -471,7 +473,7 @@ shader_variant_create(struct radv_device<br>
>> *device,<br>
>> free(binary.relocs);<br>
>> variant->ref_count = 1;<br>
>> - if (device->trace_bo) {<br>
>> + if (device->keep_shader_info) {<br>
>> variant->disasm_string = binary.disasm_string;<br>
>> if (!gs_copy_shader && !module->nir) {<br>
>> variant->nir = *shaders;<br>
>> @@ -593,11 +595,20 @@ radv_get_shader_name(struct<br>
>> radv_shader_variant *var, gl_shader_stage stage)<br>
>> };<br>
>> }<br>
>> -void<br>
>> -radv_shader_dump_stats(struct radv_device *device,<br>
>> - struct radv_shader_variant *variant,<br>
>> - gl_shader_stage stage,<br>
>> - FILE *file)<br>
>> +static uint32_t<br>
>> +get_total_sgprs(struct radv_device *device)<br>
>> +{<br>
>> + if (device->physical_device->rad_<wbr>info.chip_class >= VI)<br>
>> + return 800;<br>
>> + else<br>
>> + return 512;<br>
>> +}<br>
>> +<br>
>> +static void<br>
>> +generate_shader_stats(struct radv_device *device,<br>
>> + struct radv_shader_variant *variant,<br>
>> + gl_shader_stage stage,<br>
>> + struct _mesa_string_buffer *buf)<br>
>> {<br>
>> unsigned lds_increment =<br>
>> device->physical_device->rad_<wbr>info.chip_class >= CIK ? 512 : 256;<br>
>> struct ac_shader_config *conf;<br>
>> @@ -623,12 +634,8 @@ radv_shader_dump_stats(struct radv_device<br>
>> *device,<br>
>> lds_increment);<br>
>> }<br>
>> - if (conf->num_sgprs) {<br>
>> - if (device->physical_device->rad_<wbr>info.chip_class<br>
>> >= VI)<br>
>> - max_simd_waves = MIN2(max_simd_waves,<br>
>> 800 / conf->num_sgprs);<br>
>> - else<br>
>> - max_simd_waves = MIN2(max_simd_waves,<br>
>> 512 / conf->num_sgprs);<br>
>> - }<br>
>> + if (conf->num_sgprs)<br>
>> + max_simd_waves = MIN2(max_simd_waves,<br>
>> get_total_sgprs(device) / conf->num_sgprs);<br>
>> if (conf->num_vgprs)<br>
>> max_simd_waves = MIN2(max_simd_waves, 256 /<br>
>> conf->num_vgprs);<br>
>> @@ -639,27 +646,121 @@ radv_shader_dump_stats(struct radv_device<br>
>> *device,<br>
>> if (lds_per_wave)<br>
>> max_simd_waves = MIN2(max_simd_waves, 16384 /<br>
>> lds_per_wave);<br>
>> + if (stage == MESA_SHADER_FRAGMENT) {<br>
>> + _mesa_string_buffer_printf(<wbr>buf, "*** SHADER<br>
>> CONFIG ***\n"<br>
>> + "SPI_PS_INPUT_ADDR =<br>
>> 0x%04x\n"<br>
>> + "SPI_PS_INPUT_ENA =<br>
>> 0x%04x\n",<br>
>> +<br>
>> conf->spi_ps_input_addr, conf->spi_ps_input_ena);<br>
>> + }<br>
>> +<br>
>> + _mesa_string_buffer_printf(<wbr>buf, "*** SHADER STATS ***\n"<br>
>> + "SGPRS: %d\n"<br>
>> + "VGPRS: %d\n"<br>
>> + "Spilled SGPRs: %d\n"<br>
>> + "Spilled VGPRs: %d\n"<br>
>> + "Code Size: %d bytes\n"<br>
>> + "LDS: %d blocks\n"<br>
>> + "Scratch: %d bytes per wave\n"<br>
>> + "Max Waves: %d\n"<br>
>> + "********************\n\n\n",<br>
>> + conf->num_sgprs,<br>
>> conf->num_vgprs,<br>
>> + conf->spilled_sgprs,<br>
>> conf->spilled_vgprs, variant->code_size,<br>
>> + conf->lds_size,<br>
>> conf->scratch_bytes_per_wave,<br>
>> + max_simd_waves);<br>
>> +}<br>
>> +<br>
>> +void<br>
>> +radv_shader_dump_stats(struct radv_device *device,<br>
>> + struct radv_shader_variant *variant,<br>
>> + gl_shader_stage stage,<br>
>> + FILE *file)<br>
>> +{<br>
>> + struct _mesa_string_buffer *buf =<br>
>> _mesa_string_buffer_create(<wbr>NULL, 256);<br>
>> +<br>
>> + generate_shader_stats(device, variant, stage, buf);<br>
>> +<br>
>> fprintf(file, "\n%s:\n", radv_get_shader_name(variant,<br>
>> stage));<br>
>> + fprintf(file, buf->buf);<br>
>> - if (stage == MESA_SHADER_FRAGMENT) {<br>
>> - fprintf(file, "*** SHADER CONFIG ***\n"<br>
>> - "SPI_PS_INPUT_ADDR = 0x%04x\n"<br>
>> - "SPI_PS_INPUT_ENA = 0x%04x\n",<br>
>> - conf->spi_ps_input_addr,<br>
>> conf->spi_ps_input_ena);<br>
>> + _mesa_string_buffer_destroy(<wbr>buf);<br>
>> +}<br>
>> +<br>
>> +VkResult<br>
>> +radv_GetShaderInfoAMD(<wbr>VkDevice _device,<br>
>> + VkPipeline _pipeline,<br>
>> + VkShaderStageFlagBits shaderStage,<br>
>> + VkShaderInfoTypeAMD infoType,<br>
>> + size_t* pInfoSize,<br>
>> + void* pInfo)<br>
>> +{<br>
>> + RADV_FROM_HANDLE(radv_device, device, _device);<br>
>> + RADV_FROM_HANDLE(radv_<wbr>pipeline, pipeline, _pipeline);<br>
>> + gl_shader_stage stage =<br>
>> vk_to_mesa_shader_stage(<wbr>shaderStage);<br>
>> + struct radv_shader_variant *variant =<br>
>> pipeline->shaders[stage];<br>
>> + struct _mesa_string_buffer *buf;<br>
>> + VkResult result = VK_SUCCESS;<br>
>> +<br>
>> + /* Spec doesn't indicate what to do if the stage is<br>
>> invalid, so just<br>
>> + * return no info for this. */<br>
>> + if (!variant)<br>
>> + return VK_ERROR_FEATURE_NOT_PRESENT;<br>
>> +<br>
>> + switch (infoType) {<br>
>> + case VK_SHADER_INFO_TYPE_<wbr>STATISTICS_AMD:<br>
>> + if (!pInfo) {<br>
>> + *pInfoSize =<br>
>> sizeof(<wbr>VkShaderStatisticsInfoAMD);<br>
>> + } else {<br>
>> + struct ac_shader_config *conf =<br>
>> &variant->config;<br>
>> +<br>
>> + VkShaderStatisticsInfoAMD statistics = {};<br>
>> + statistics.shaderStageMask = shaderStage;<br>
>> + statistics.resourceUsage.<wbr>numUsedVgprs =<br>
>> conf->num_vgprs + conf->spilled_vgprs;<br>
>> + statistics.resourceUsage.<wbr>numUsedSgprs =<br>
>> conf->num_sgprs + conf->spilled_sgprs;<br>
>> +<br>
>> statistics.resourceUsage.<wbr>ldsSizePerLocalWorkGroup = 16384;<br>
>> +<br>
>> statistics.resourceUsage.<wbr>ldsUsageSizeInBytes = conf->lds_size;<br>
>> +<br>
>> statistics.resourceUsage.<wbr>scratchMemUsageInBytes =<br>
>> conf->scratch_bytes_per_wave;<br>
>> + statistics.numPhysicalVgprs =<br>
>> statistics.numAvailableVgprs = 256;<br>
>> + statistics.numPhysicalSgprs =<br>
>> statistics.numAvailableSgprs = get_total_sgprs(device);<br>
>> + statistics.<wbr>computeWorkGroupSize[0] =<br>
>> variant->nir->info.cs.local_<wbr>size[0];<br>
>> + statistics.<wbr>computeWorkGroupSize[1] =<br>
>> variant->nir->info.cs.local_<wbr>size[1];<br>
>> + statistics.<wbr>computeWorkGroupSize[2] =<br>
>> variant->nir->info.cs.local_<wbr>size[2];<br>
>> +<br>
>> + size_t size = *pInfoSize;<br>
>> + *pInfoSize = sizeof(statistics);<br>
>> +<br>
>> + memcpy(pInfo, &statistics, MIN2(size,<br>
>> *pInfoSize));<br>
>> +<br>
>> + if (size < *pInfoSize)<br>
>> + result = VK_INCOMPLETE;<br>
>> + }<br>
>> +<br>
>> + break;<br>
>> + case VK_SHADER_INFO_TYPE_<wbr>DISASSEMBLY_AMD:<br>
>> + buf = _mesa_string_buffer_create(<wbr>NULL, 1024);<br>
>> +<br>
>> + _mesa_string_buffer_printf(<wbr>buf, "%s:\n",<br>
>> radv_get_shader_name(variant, stage));<br>
>> + _mesa_string_buffer_printf(<wbr>buf, "%s\n\n",<br>
>> variant->disasm_string);<br>
>> + generate_shader_stats(device, variant, stage,<br>
>> buf);<br>
>> +<br>
>> + if (!pInfo) {<br>
>> + *pInfoSize = buf->length;<br>
>> + } else {<br>
>> + size_t size = *pInfoSize;<br>
>> + *pInfoSize = buf->length;<br>
>> +<br>
>> + memcpy(pInfo, buf->buf, MIN2(size,<br>
>> buf->length));<br>
>> +<br>
>> + if (size < buf->length)<br>
>> + result = VK_INCOMPLETE;<br>
>> + }<br>
>> +<br>
>> + _mesa_string_buffer_destroy(<wbr>buf);<br>
>> + break;<br>
>> + default:<br>
>> + /* VK_SHADER_INFO_TYPE_BINARY_AMD unimplemented<br>
>> for now. */<br>
>> + result = VK_ERROR_FEATURE_NOT_PRESENT;<br>
>> + break;<br>
>> }<br>
>> - fprintf(file, "*** SHADER STATS ***\n"<br>
>> - "SGPRS: %d\n"<br>
>> - "VGPRS: %d\n"<br>
>> - "Spilled SGPRs: %d\n"<br>
>> - "Spilled VGPRs: %d\n"<br>
>> - "Code Size: %d bytes\n"<br>
>> - "LDS: %d blocks\n"<br>
>> - "Scratch: %d bytes per wave\n"<br>
>> - "Max Waves: %d\n"<br>
>> - "********************\n\n\n",<br>
>> - conf->num_sgprs, conf->num_vgprs,<br>
>> - conf->spilled_sgprs, conf->spilled_vgprs,<br>
>> variant->code_size,<br>
>> - conf->lds_size, conf->scratch_bytes_per_wave,<br>
>> - max_simd_waves);<br>
>> + return result;<br>
>> }<br>
>><br>
>><br>
</div></div>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div><br></div></div>