[Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

Koenig, Christian Christian.Koenig at amd.com
Mon Jun 3 12:27:52 UTC 2019


Am 03.06.19 um 14:21 schrieb Alex Smith:
On Mon, 3 Jun 2019 at 11:57, Koenig, Christian <Christian.Koenig at amd.com<mailto:Christian.Koenig at amd.com>> wrote:
Am 02.06.19 um 12:32 schrieb Alex Smith:
> Put the uncached GTT type at a higher index than the visible VRAM type,
> rather than having GTT first.
>
> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> type, and the property flags for GTT and visible VRAM are identical.
> According to the spec, for types with identical flags, we should give
> the one with better performance a lower index.
>
> Previously, apps which follow the spec guidance for choosing a memory
> type would have picked the GTT type in preference to visible VRAM (all
> Feral games will do this), and end up with lower performance.
>
> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> other (Feral) games and saw similar improvement on those as well.

Well that patch doesn't looks like a good idea to me.

Using VRAM over uncached GTT should have something between no and only
minimal performance difference on APU.

To make things even worse VRAM is still needed for scanout and newer
laptops have only a very very low default setting (32 or 16MB). So you
can end up in VRAM clashing on those systems.

Can you check some kernel statistics to figure out what exactly is going
on here?

What statistics should I look at?

First of all take a look at amdgpu_gem_info file in the debugfs directory while using GTT and match that to using VRAM. You should see a lot more of GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag is missing we have found the problem.

If that looks ok, then take a look at the ttm_page_pool or ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got. Huge pages should be used for anything larger than 2MB, if not we have found the problem.

If that still isn't the issue I need to take a look at the VM code again and see if we still map VRAM/GTT differently on APUs.

Thanks,
Christian.


Thanks,
Alex


Regards,
Christian.

>
> Signed-off-by: Alex Smith <asmith at feralinteractive.com<mailto:asmith at feralinteractive.com>>
> ---
> I noticed that the memory types advertised on my Raven laptop looked a
> bit odd so played around with it and found this. I'm not sure if it is
> actually expected that the performance difference between visible VRAM
> and GTT is so large, seeing as it's not dedicated VRAM, but the results
> are clear (and consistent, tested multiple times).
> ---
>   src/amd/vulkan/radv_device.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 3cf050ed220..d36ee226ebd 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct radv_physical_device *device)
>                       .heapIndex = vram_index,
>               };
>       }
> -     if (gart_index >= 0) {
> +     if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>               device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>               device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
>                       .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> -                     VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> -                     (device->rad_info.has_dedicated_vram ? 0 : VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> +                     VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>                       .heapIndex = gart_index,
>               };
>       }
> @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct radv_physical_device *device)
>                       .heapIndex = visible_vram_index,
>               };
>       }
> +     if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> +             /* Put GTT after visible VRAM for GPUs without dedicated VRAM
> +              * as they have identical property flags, and according to the
> +              * spec, for types with identical flags, the one with greater
> +              * performance must be given a lower index. */
> +             device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> +             device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
> +                     .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> +                     VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> +                     VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> +                     .heapIndex = gart_index,
> +             };
> +     }
>       if (gart_index >= 0) {
>               device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
>               device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190603/3d5509cf/attachment-0001.html>


More information about the mesa-dev mailing list