[PATCH 27/27] drm/amdgpu: Fix GTT size calculation

Thu May 2 13:06:49 UTC 2019

Am 30.04.19 um 19:25 schrieb Kuehling, Felix:
> [SNIP]
>>>> To sum it up the requirement of using almost all system memory by a GPU
>>>> is simply not possible upstream and even in any production system rather
>>>> questionable.
>>> It should be doable with userptr, which now uses unpinned pages through
>>> HMM. Currently the GTT limit affects the largest possible userptr
>>> allocation, though not the total sum of all userptr allocations. Maybe
>>> making userptr completely independent of GTT size would be an easier
>>> problem to tackle. Then I can leave the GTT limit alone.
>> Well this way we would only avoid the symptoms, but not the real problem.
> It allocates pages in user mode rather than kernel mode. That means, OOM
> situations take a completely different code path. Before running out of
> memory completely, triggering the OOM killer, the kernel would start
> swapping pages, which would trigger the MMU notifier to stop the user
> mode queues or invalidate GPU page table entries, and allow the pages to
> be swapped out.

Well it at least removes the extra layer in TTM we have here.

But what I meant is that it still doesn't fix our underlying problem of 
stopping the hardware immediately.

>>>> The only real solution I can see is to be able to reliable kill shaders
>>>> in an OOM situation.
>>> Well, we can in fact preempt our compute shaders with low latency.
>>> Killing a KFD process will do exactly that.
>> I've taken a look at that thing as well and to be honest it is not even
>> remotely sufficient.
>>
>> We need something which stops the hardware *immediately* from accessing
>> system memory, and not wait for the SQ to kill all waves, flush caches
>> etc...
> It's apparently sufficient to use in our MMU notifier. There is also a
> way to disable the grace period that allows short waves to complete
> before being preempted, though we're not using that at the moment.
>
>
>> One possibility I'm playing around with for a while is to replace the
>> root PD for the VMIDs in question on the fly. E.g. we just let it point
>> to some dummy which redirects everything into nirvana.
> Even that's not sufficient. You'll also need to free the pages
> immediately. For KFD processes, cleaning up of memory is done in a
> worker thread that gets kicked off by a release MMU notifier when the
> process' mm_struct is taken down.

Yeah, that worker is what I meant with that this whole thing is not 
sufficient. BTW: How does that still work with HMM? I mean HMM doesn't 
take a reference to the pages any more.

But let me say it differently: When we want the OOM killer to work 
correctly we need to have a short cut path which doesn't takes any locks 
or allocates memory.

What we can do is: Write some registers and then maybe busy wait for the 
TLB flush to complete.

What we can't do is: Waiting for the SQ to kill waves, waiting for 
fences etc...

> Then there is still TTM's delayed freeing of BOs that waits for fences.
> So you'd need to signal all the BO fences to allow them to be released.

Actually that doesn't apply in the critical code path. In this situation 
TTM tries to free things up it doesn't need to wait for immediately.

What we are missing here is something like a kill interface for fences 
maybe...

> TBH, I don't understand why waiting is not an option, if the alternative
> is a kernel panic. If your OOM killer kicks in, your system is basically
> dead. Waiting for a fraction of a second to let a GPU finish its memory
> access should be a small price to pay in that situation.

Oh, yeah that is really good point as well.

I think that this restriction was created to make sure that the OOM 
killer always makes progress and doesn't waits for things like network 
congestion.

Now the Linux MM is not really made for long term I/O mappings anyway. 
And that was also recently a topic on the lists in the context of HMM 
(there is a LWN summary about it). Probably worth bringing that 
discussion up once more.

Christian.

>
> Regards,
>     Felix
>
>
>> But implementing this is easier said than done...
>>
>> Regards,
>> Christian.
>>