[PATCH 27/27] drm/amdgpu: Fix GTT size calculation
Kuehling, Felix
Felix.Kuehling at amd.com
Tue Apr 30 17:25:09 UTC 2019
On 2019-04-30 1:03 p.m., Koenig, Christian wrote:
> Am 30.04.19 um 17:36 schrieb Kuehling, Felix:
>> On 2019-04-30 5:32 a.m., Christian König wrote:
>>> [CAUTION: External Email]
>>>
>>> Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
>>>> On 2019-04-29 8:34 a.m., Christian König wrote:
>>>>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>>>>>> From: Kent Russell <kent.russell at amd.com>
>>>>>>
>>>>>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>>>>>> system memory. This severely limits the quanitity of system memory
>>>>>> that can be used by ROCm application.
>>>>>>
>>>>>> Increase GTT size to the maximum of VRAM size or system memory size.
>>>>> Well, NAK.
>>>>>
>>>>> This limit was done on purpose because we otherwise the
>>>>> max-texture-size would be crashing the system because the OOM killer
>>>>> would be causing a system panic.
>>>>>
>>>>> Using more than 75% of system memory by the GPU at the same time makes
>>>>> the system unstable and so we can't allow that by default.
>>>> Like we discussed, the current implementation is too limiting. On a Fiji
>>>> system with 4GB VRAM and 32GB system memory, it limits system memory
>>>> allocations to 4GB. I think this workaround was fixed once before and
>>>> reverted because it broke a CZ system with 1GB system memory. So I
>>>> suspect that this is an issue affecting small memory systems where maybe
>>>> the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
>>>> panics.
>>> Well it not only broke on a 1GB CZ system, this was just where Andrey
>>> reproduced it. We got reports from all kind of systems.
>> I'd like to see those reports. This patch has been included in Linux Pro
>> releases since 18.20. I'm not aware that anyone complained about it.
> Well to be honest our Pro driver is actually not used that widely and
> only used on rather homogeneous systems.
>
> Which is not really surprising since we only advise to use it on
> professional use cases.
>
>>>> The OOM killer problem is a more general problem that potentially
>>>> affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
>>>> inadequate workaround at best. I'd like to look for a better solution,
>>>> probably some adjustment of the TTM system memory limits on systems with
>>>> small memory, to avoid OOM panics on such systems.
>>> The core problem here is that the OOM killer explicitly doesn't want to
>>> block for shaders to finish whatever it is doing.
>>>
>>> So currently as soon as the hardware is using some memory it can't be
>>> reclaimed immediately.
>>>
>>> The original limit in TTM was 2/3 of system memory and that worked
>>> really reliable and we ran into problems only after raising it to 3/4.
>> The TTM system memory limit is still 3/8 soft and 1/2 hard, 3/4 for
>> emergencies. See ttm_mem_init_kernel_zone. AFAICT, the emergency limit
>> is only available to root.
> Ah! I think I know why those limits doesn't kick in here!
>
> When GTT space is used by evictions from VRAM then we will use the
> emergency limit as well.
>
>> This GTT limit kicks in before I get anywhere close to the TTM limit.
>> That's why I think it is both broken and redundant.
> That was also the argument when we removed it the last time, but it got
> immediately reverted.
>
>>> To sum it up the requirement of using almost all system memory by a GPU
>>> is simply not possible upstream and even in any production system rather
>>> questionable.
>> It should be doable with userptr, which now uses unpinned pages through
>> HMM. Currently the GTT limit affects the largest possible userptr
>> allocation, though not the total sum of all userptr allocations. Maybe
>> making userptr completely independent of GTT size would be an easier
>> problem to tackle. Then I can leave the GTT limit alone.
> Well this way we would only avoid the symptoms, but not the real problem.
It allocates pages in user mode rather than kernel mode. That means, OOM
situations take a completely different code path. Before running out of
memory completely, triggering the OOM killer, the kernel would start
swapping pages, which would trigger the MMU notifier to stop the user
mode queues or invalidate GPU page table entries, and allow the pages to
be swapped out.
>
>>> The only real solution I can see is to be able to reliable kill shaders
>>> in an OOM situation.
>> Well, we can in fact preempt our compute shaders with low latency.
>> Killing a KFD process will do exactly that.
> I've taken a look at that thing as well and to be honest it is not even
> remotely sufficient.
>
> We need something which stops the hardware *immediately* from accessing
> system memory, and not wait for the SQ to kill all waves, flush caches
> etc...
It's apparently sufficient to use in our MMU notifier. There is also a
way to disable the grace period that allows short waves to complete
before being preempted, though we're not using that at the moment.
>
> One possibility I'm playing around with for a while is to replace the
> root PD for the VMIDs in question on the fly. E.g. we just let it point
> to some dummy which redirects everything into nirvana.
Even that's not sufficient. You'll also need to free the pages
immediately. For KFD processes, cleaning up of memory is done in a
worker thread that gets kicked off by a release MMU notifier when the
process' mm_struct is taken down.
Then there is still TTM's delayed freeing of BOs that waits for fences.
So you'd need to signal all the BO fences to allow them to be released.
TBH, I don't understand why waiting is not an option, if the alternative
is a kernel panic. If your OOM killer kicks in, your system is basically
dead. Waiting for a fraction of a second to let a GPU finish its memory
access should be a small price to pay in that situation.
Regards,
Felix
>
> But implementing this is easier said than done...
>
> Regards,
> Christian.
>
More information about the amd-gfx
mailing list