[PATCH 27/27] drm/amdgpu: Fix GTT size calculation

Kuehling, Felix Felix.Kuehling at amd.com
Mon Apr 29 23:16:02 UTC 2019


On 2019-04-29 8:34 a.m., Christian König wrote:
> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>> From: Kent Russell <kent.russell at amd.com>
>>
>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>> system memory. This severely limits the quanitity of system memory
>> that can be used by ROCm application.
>>
>> Increase GTT size to the maximum of VRAM size or system memory size.
>
> Well, NAK.
>
> This limit was done on purpose because we otherwise the 
> max-texture-size would be crashing the system because the OOM killer 
> would be causing a system panic.
>
> Using more than 75% of system memory by the GPU at the same time makes 
> the system unstable and so we can't allow that by default.

Like we discussed, the current implementation is too limiting. On a Fiji 
system with 4GB VRAM and 32GB system memory, it limits system memory 
allocations to 4GB. I think this workaround was fixed once before and 
reverted because it broke a CZ system with 1GB system memory. So I 
suspect that this is an issue affecting small memory systems where maybe 
the 1/2 system memory limit in TTM isn't sufficient to protect from OOM 
panics.

The OOM killer problem is a more general problem that potentially 
affects other drivers too. Keeping this GTT limit broken in AMDGPU is an 
inadequate workaround at best. I'd like to look for a better solution, 
probably some adjustment of the TTM system memory limits on systems with 
small memory, to avoid OOM panics on such systems.

Regards,
   Felix


>
> What could maybe work is to reduce amount of system memory by a fixed 
> factor, but I of hand don't see a way of fixing this in general.
>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Kent Russell <kent.russell at amd.com>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index c14198737dcd..e9ecc3953673 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>>           struct sysinfo si;
>>             si_meminfo(&si);
>> -        gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> -                   adev->gmc.mc_vram_size),
>> -                   ((uint64_t)si.totalram * si.mem_unit * 3/4));
>> -    }
>> -    else
>> +        gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> +                adev->gmc.mc_vram_size,
>> +                ((uint64_t)si.totalram * si.mem_unit));
>> +    } else
>>           gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>>         /* Initialize GTT memory pool */
>


More information about the amd-gfx mailing list