[PATCH] drm/amdgpu: only use kernel zone if need_dma32 is not required

Yang, Philip Philip.Yang at amd.com
Wed Jun 12 21:13:51 UTC 2019


On 2019-06-12 3:28 p.m., Christian König wrote:
> Am 12.06.19 um 17:13 schrieb Yang, Philip:
>> TTM create two zones, kernel zone and dma32 zone for system memory. If
>> system memory address allocated is below 4GB, this account to dma32 zone
>> and will exhaust dma32 zone and trigger unnesssary TTM eviction.
>>
>> Patch "drm/ttm: Account for kernel allocations in kernel zone only" only
>> handle the allocation for acc_size, the system memory page allocation is
>> through ttm_mem_global_alloc_page which still account to dma32 zone if
>> page is below 4GB.
> 
> NAK, as the name says the mem_glob is global for all devices in the system.
> 
> So this will break if you mix DMA32 and non DMA32 in the same system 
> which is exactly the configuration my laptop here has :(
>
I didn't find path use dma32 zone, but I may missed something. There is 
an issue found by KFDTest.BigBufStressTest, it allocates buffers up to 
3/8 of total 256GB system memory, each buffer size is 128MB, then use 
queue to write to the buffers. If ttm_mem_global_alloc_page get page pfn 
is below 4GB, it account to dma32 zone and will exhaust 2GB limit, then 
ttm_check_swapping will schedule ttm_shrink_work to start eviction. It 
takes minutes to finish restore (retry many times if busy), the test 
failed because queue timeout. This eviction is unnecessary because we 
still have enough free system memory.

It's random case, happens about 1/5. I can change test to increase the 
timeout value to workaround this, but this seems TTM bug. This will slow 
application performance a lot if this random issue happens.

Thanks,
Philip


> Christian.
> 
>>
>> Change-Id: I289b85d891b8f64a1422c42b1eab398098ab7ef7
>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 2778ff63d97d..79bb9dfe617b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -1686,6 +1686,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>>       }
>>       adev->mman.initialized = true;
>> +    /* Only kernel zone (no dma32 zone) if device does not require 
>> dma32 */
>> +    if (!adev->need_dma32)
>> +        adev->mman.bdev.glob->mem_glob->num_zones = 1;
>> +
>>       /* We opt to avoid OOM on system pages allocations */
>>       adev->mman.bdev.no_retry = true;
> 


More information about the amd-gfx mailing list