[PATCH] drm/ttm: stop warning on TT shrinker failure

Christian König ckoenig.leichtzumerken at gmail.com
Tue Mar 23 12:21:32 UTC 2021


Am 23.03.21 um 13:04 schrieb Michal Hocko:
> On Tue 23-03-21 12:48:58, Christian König wrote:
>> Am 23.03.21 um 12:28 schrieb Daniel Vetter:
>>> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
>>>> On Mon 22-03-21 20:34:25, Christian König wrote:
> [...]
>>>>> My only concern is that if I could rely on memalloc_no* being used we could
>>>>> optimize this quite a bit further.
>>>> Yes you can use the scope API and you will be guaranteed that _any_
>>>> allocation from the enclosed context will inherit GFP_NO* semantic.
>> The question is if this is also guaranteed the other way around?
>>
>> In other words if somebody calls get_free_page(GFP_NOFS) are the context
>> flags set as well?
> gfp mask is always restricted in the page allocator. So say you have
> noio scope context and call get_free_page/kmalloc(GFP_NOFS) then the
> scope would restrict the allocation flags to GFP_NOIO (aka drop
> __GFP_IO). For further details, have a look at current_gfp_context
> and its callers.
>
> Does this answer your question?

But what happens if you don't have noio scope and somebody calls 
get_free_page(GFP_NOFS)?

Is then the noio scope added automatically? And is it possible that the 
shrinker gets called without noio scope even we would need it?

>>>> I think this is where I don't get yet what Christian tries to do: We
>>>> really shouldn't do different tricks and calling contexts between direct
>>>> reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
>>>> pretty much guaranteed. So whether we use explicit gfp flags or the
>>>> context apis, result is exactly the same.
>> Ok let us recap what TTMs TT shrinker does here:
>>
>> 1. We got memory which is not swapable because it might be accessed by the
>> GPU at any time.
>> 2. Make sure the memory is not accessed by the GPU and driver need to grab a
>> lock before they can make it accessible again.
>> 3. Allocate a shmem file and copy over the not swapable pages.
> This is quite tricky because the shrinker operates in the PF_MEMALLOC
> context so such an allocation would be allowed to completely deplete
> memory unless you explicitly mark that context as __GFP_NOMEMALLOC.

Thanks, exactly that was one thing I was absolutely not sure about. And 
yes I agree that this is really tricky.

Ideally I would like to be able to trigger swapping out the shmem page I 
allocated immediately after doing the copy.

This way I would only need a single page for the whole shrink operation 
at any given time.

> Also note that if the allocation cannot succeed it will not trigger reclaim
> again because you are already called from the reclaim context.
>
>> 4. Free the not swapable/reclaimable pages.
>>
>> The pages we got from the shmem file are easily swapable to disk after the
>> copy is completed. But only if IO is not already blocked because the
>> shrinker was called from an allocation restricted by GFP_NOFS or GFP_NOIO.
> Sorry for being dense here but I still do not follow the actual problem
> (well, except for the above mentioned one). Is the sole point of this to
> emulate a GFP_NO* allocation context and see how shrinker behaves?

Please be as dense as you need to be :)

I think Daniel and I only have a very rough understanding of the memory 
management details here, but we need exactly that knowledge to get the 
GPU memory management into the shape we want it to be.

Thanks,
Christian.


More information about the amd-gfx mailing list