[PATCH 27/27] drm/amdgpu: Fix GTT size calculation
Kuehling, Felix
Felix.Kuehling at amd.com
Mon Apr 29 23:16:02 UTC 2019
On 2019-04-29 8:34 a.m., Christian König wrote:
> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>> From: Kent Russell <kent.russell at amd.com>
>>
>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>> system memory. This severely limits the quanitity of system memory
>> that can be used by ROCm application.
>>
>> Increase GTT size to the maximum of VRAM size or system memory size.
>
> Well, NAK.
>
> This limit was done on purpose because we otherwise the
> max-texture-size would be crashing the system because the OOM killer
> would be causing a system panic.
>
> Using more than 75% of system memory by the GPU at the same time makes
> the system unstable and so we can't allow that by default.
Like we discussed, the current implementation is too limiting. On a Fiji
system with 4GB VRAM and 32GB system memory, it limits system memory
allocations to 4GB. I think this workaround was fixed once before and
reverted because it broke a CZ system with 1GB system memory. So I
suspect that this is an issue affecting small memory systems where maybe
the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
panics.
The OOM killer problem is a more general problem that potentially
affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
inadequate workaround at best. I'd like to look for a better solution,
probably some adjustment of the TTM system memory limits on systems with
small memory, to avoid OOM panics on such systems.
Regards,
Felix
>
> What could maybe work is to reduce amount of system memory by a fixed
> factor, but I of hand don't see a way of fixing this in general.
>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Kent Russell <kent.russell at amd.com>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index c14198737dcd..e9ecc3953673 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>> struct sysinfo si;
>> si_meminfo(&si);
>> - gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> - adev->gmc.mc_vram_size),
>> - ((uint64_t)si.totalram * si.mem_unit * 3/4));
>> - }
>> - else
>> + gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> + adev->gmc.mc_vram_size,
>> + ((uint64_t)si.totalram * si.mem_unit));
>> + } else
>> gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>> /* Initialize GTT memory pool */
>
More information about the amd-gfx
mailing list