[Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM
Bas Nieuwenhuizen
bas at basnieuwenhuizen.nl
Mon Jun 10 13:56:18 UTC 2019
On Sat, Jun 8, 2019 at 3:36 PM Alex Smith <asmith at feralinteractive.com> wrote:
>
> On Mon, 3 Jun 2019 at 13:27, Koenig, Christian <Christian.Koenig at amd.com> wrote:
>>
>> Am 03.06.19 um 14:21 schrieb Alex Smith:
>>
>> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian <Christian.Koenig at amd.com> wrote:
>>>
>>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>>> > rather than having GTT first.
>>> >
>>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>>> > type, and the property flags for GTT and visible VRAM are identical.
>>> > According to the spec, for types with identical flags, we should give
>>> > the one with better performance a lower index.
>>> >
>>> > Previously, apps which follow the spec guidance for choosing a memory
>>> > type would have picked the GTT type in preference to visible VRAM (all
>>> > Feral games will do this), and end up with lower performance.
>>> >
>>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>>> > other (Feral) games and saw similar improvement on those as well.
>>>
>>> Well that patch doesn't looks like a good idea to me.
>>>
>>> Using VRAM over uncached GTT should have something between no and only
>>> minimal performance difference on APU.
>>>
>>> To make things even worse VRAM is still needed for scanout and newer
>>> laptops have only a very very low default setting (32 or 16MB). So you
>>> can end up in VRAM clashing on those systems.
>>>
>>> Can you check some kernel statistics to figure out what exactly is going
>>> on here?
>>
>>
>> What statistics should I look at?
>>
>>
>> First of all take a look at amdgpu_gem_info file in the debugfs directory while using GTT and match that to using VRAM. You should see a lot more of GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag is missing we have found the problem.
>>
>> If that looks ok, then take a look at the ttm_page_pool or ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got. Huge pages should be used for anything larger than 2MB, if not we have found the problem.
>>
>> If that still isn't the issue I need to take a look at the VM code again and see if we still map VRAM/GTT differently on APUs.
>
>
> OK, got around to looking at this. amdgpu_gem_info does have more USWC entries when using GTT. I've attached the output from VRAM vs GTT in case you can spot anything else in there.
>
> ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.
To add to this, I tried rounding up the size all application GTT
allocations to a multiple of 2 megabytes (+ a suballocator for buffers
< 2M). This increased performance a bit but not nearly what going from
GTT->"VRAM" brings.
>
> FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still the same perf difference there.
>
> Thanks,
> Alex
>
>>
>>
>> Thanks,
>> Christian.
>>
>>
>> Thanks,
>> Alex
>>
>>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> >
>>> > Signed-off-by: Alex Smith <asmith at feralinteractive.com>
>>> > ---
>>> > I noticed that the memory types advertised on my Raven laptop looked a
>>> > bit odd so played around with it and found this. I'm not sure if it is
>>> > actually expected that the performance difference between visible VRAM
>>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>>> > are clear (and consistent, tested multiple times).
>>> > ---
>>> > src/amd/vulkan/radv_device.c | 18 +++++++++++++++---
>>> > 1 file changed, 15 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>>> > index 3cf050ed220..d36ee226ebd 100644
>>> > --- a/src/amd/vulkan/radv_device.c
>>> > +++ b/src/amd/vulkan/radv_device.c
>>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct radv_physical_device *device)
>>> > .heapIndex = vram_index,
>>> > };
>>> > }
>>> > - if (gart_index >= 0) {
>>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>>> > device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>>> > device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
>>> > .propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>>> > - (device->rad_info.has_dedicated_vram ? 0 : VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>>> > .heapIndex = gart_index,
>>> > };
>>> > }
>>> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct radv_physical_device *device)
>>> > .heapIndex = visible_vram_index,
>>> > };
>>> > }
>>> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
>>> > + /* Put GTT after visible VRAM for GPUs without dedicated VRAM
>>> > + * as they have identical property flags, and according to the
>>> > + * spec, for types with identical flags, the one with greater
>>> > + * performance must be given a lower index. */
>>> > + device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>>> > + device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
>>> > + .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
>>> > + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>>> > + .heapIndex = gart_index,
>>> > + };
>>> > + }
>>> > if (gart_index >= 0) {
>>> > device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
>>> > device->memory_properties.memoryTypes[type_count++] = (VkMemoryType) {
>>>
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list