[PATCH v14 3/8] drm/ttm/pool: Provide a helper to shrink pages

Wed Dec 4 10:56:15 UTC 2024

Am 04.12.24 um 10:56 schrieb Thomas Hellström:
> On Wed, 2024-12-04 at 10:16 +0100, Christian König wrote:
>> Am 03.12.24 um 18:44 schrieb Thomas Hellström:
>>> On Tue, 2024-12-03 at 17:46 +0100, Christian König wrote:
>>>> Am 03.12.24 um 17:43 schrieb Thomas Hellström:
>>>>> On Tue, 2024-12-03 at 17:39 +0100, Christian König wrote:
>>>>>> Am 03.12.24 um 17:31 schrieb Thomas Hellström:
>>>>>>> On Tue, 2024-12-03 at 17:20 +0100, Christian König wrote:
>>>>>>>> [SNIP]
>>>>>>>>>>>>> @@ -453,9 +601,36 @@ int ttm_pool_alloc(struct
>>>>>>>>>>>>> ttm_pool
>>>>>>>>>>>>> *pool,
>>>>>>>>>>>>> struct ttm_tt *tt,
>>>>>>>>>>>>>         	else
>>>>>>>>>>>>>         		gfp_flags |= GFP_HIGHUSER;
>>>>>>>>>>>>>         
>>>>>>>>>>>>> -	for (order = min_t(unsigned int,
>>>>>>>>>>>>> MAX_PAGE_ORDER,
>>>>>>>>>>>>> __fls(num_pages));
>>>>>>>>>>>>> -	     num_pages;
>>>>>>>>>>>>> -	     order = min_t(unsigned int,
>>>>>>>>>>>>> order,
>>>>>>>>>>>>> __fls(num_pages)))
>>>>>>>>>>>>> {
>>>>>>>>>>>>> +	order = min_t(unsigned int,
>>>>>>>>>>>>> MAX_PAGE_ORDER,
>>>>>>>>>>>>> __fls(num_pages));
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +	if (tt->page_flags &
>>>>>>>>>>>>> TTM_TT_FLAG_PRIV_BACKED_UP) {
>>>>>>>>>>>>> +		if (!tt->restore) {
>>>>>>>>>>>>> +			gfp_t gfp = GFP_KERNEL
>>>>>>>>>>>>> |
>>>>>>>>>>>>> __GFP_NOWARN;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +			if (ctx-
>>>>>>>>>>>>>> gfp_retry_mayfail)
>>>>>>>>>>>>> +				gfp |=
>>>>>>>>>>>>> __GFP_RETRY_MAYFAIL;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +			tt->restore =
>>>>>>>>>>>>> +				kvzalloc(struc
>>>>>>>>>>>>> t_si
>>>>>>>>>>>>> ze(t
>>>>>>>>>>>>> t-
>>>>>>>>>>>>>> restore,
>>>>>>>>>>>>> old_pages,
>>>>>>>>>>>>> +					
>>>>>>>>>>>>> 	
>>>>>>>>>>>>> (size_t)1
>>>>>>>>>>>>> <<
>>>>>>>>>>>>> order), gfp);
>>>>>>>>>>>>> +			if (!tt->restore)
>>>>>>>>>>>>> +				return -
>>>>>>>>>>>>> ENOMEM;
>>>>>>>>>>>>> +		} else if
>>>>>>>>>>>>> (ttm_pool_restore_valid(tt-
>>>>>>>>>>>>>> restore)) {
>>>>>>>>>>>>> +			struct
>>>>>>>>>>>>> ttm_pool_tt_restore
>>>>>>>>>>>>> *restore =
>>>>>>>>>>>>> tt-
>>>>>>>>>>>>>> restore;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +			num_pages -= restore-
>>>>>>>>>>>>>> alloced_pages;
>>>>>>>>>>>>> +			order = min_t(unsigned
>>>>>>>>>>>>> int,
>>>>>>>>>>>>> order,
>>>>>>>>>>>>> __fls(num_pages));
>>>>>>>>>>>>> +			pages += restore-
>>>>>>>>>>>>>> alloced_pages;
>>>>>>>>>>>>> +			r =
>>>>>>>>>>>>> ttm_pool_restore_tt(restore,
>>>>>>>>>>>>> tt-
>>>>>>>>>>>>>> backup, ctx);
>>>>>>>>>>>>> +			if (r)
>>>>>>>>>>>>> +				return r;
>>>>>>>>>>>>> +			caching = restore-
>>>>>>>>>>>>>> caching_divide;
>>>>>>>>>>>>> +		}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +		tt->restore->pool = pool;
>>>>>>>>>>>>> +	}
>>>>>>>>>>>> Hui? Why is that part of the allocation function
>>>>>>>>>>>> now?
>>>>>>>>>>>>
>>>>>>>>>>>> At bare minimum I would expect that this is a new
>>>>>>>>>>>> function.
>>>>>>>>>>> It's because we now have partially backed up tts,
>>>>>>>>>>> so
>>>>>>>>>>> the
>>>>>>>>>>> restore is
>>>>>>>>>>> interleaved on a per-page basis, replacing the
>>>>>>>>>>> backup
>>>>>>>>>>> handles
>>>>>>>>>>> with
>>>>>>>>>>> page-pointers. I'll see if I can separate out at
>>>>>>>>>>> least
>>>>>>>>>>> the
>>>>>>>>>>> initialization here.
>>>>>>>>>> Yeah, that kind of makes sense.
>>>>>>>>>>
>>>>>>>>>> My expectation was just that we now have explicit
>>>>>>>>>> ttm_pool_swapout()
>>>>>>>>>> and
>>>>>>>>>> ttm_pool_swapin() functions.
>>>>>>>>> I fully understand, although in the allocation step,
>>>>>>>>> that
>>>>>>>>> would
>>>>>>>>> also
>>>>>>>>> increase the memory pressure since we might momentarily
>>>>>>>>> have
>>>>>>>>> twice
>>>>>>>>> the
>>>>>>>>> bo-size allocated, if the shmem object was never
>>>>>>>>> swapped
>>>>>>>>> out,
>>>>>>>>> and
>>>>>>>>> we
>>>>>>>>> don't want to unnecessarily risc OOM at recover time,
>>>>>>>>> although
>>>>>>>>> that
>>>>>>>>> should be a recoverable situation now. If the OOM
>>>>>>>>> receiver
>>>>>>>>> can
>>>>>>>>> free
>>>>>>>>> up
>>>>>>>>> system memory resources they can could potentially
>>>>>>>>> restart
>>>>>>>>> the
>>>>>>>>> recover.
>>>>>>>> What I meant was more that we have ttm_pool_swapout()
>>>>>>>> which
>>>>>>>> does
>>>>>>>> a
>>>>>>>> mix
>>>>>>>> of moving each page to a swap backend and freeing one by
>>>>>>>> one.
>>>>>>>>
>>>>>>>> And ttm_pool_swapin() which allocates a bit of memory
>>>>>>>> (usually
>>>>>>>> one
>>>>>>>> huge
>>>>>>>> page) and then copies the content back in from the swap
>>>>>>>> backend.
>>>>>>>>
>>>>>>>> Alternatively we could rename ttm_pool_alloc() into
>>>>>>>> something
>>>>>>>> like
>>>>>>>> ttm_pool_populate() and ttm_pool_free() into
>>>>>>>> ttm_pool_unpopulate(),
>>>>>>>> but
>>>>>>>> those names are not very descriptive either.
>>>>>>>>
>>>>>>>> It's just that we now do a bit more than just alloc and
>>>>>>>> free
>>>>>>>> in
>>>>>>>> those
>>>>>>>> functions, so the naming doesn't really match that well
>>>>>>>> any
>>>>>>>> more.
>>>>>>> So what about ttm_pool_alloc() and
>>>>>>> ttm_pool_recover/swapin(),
>>>>>>> both
>>>>>>> pointing to the same code, but _alloc() asserts that the tt
>>>>>>> isn't
>>>>>>> backed up?
>>>>>>>
>>>>>>> That would give a clean interface at least.
>>>>>> More or less ok. I would just put figuring out the gfp flags
>>>>>> and
>>>>>> the
>>>>>> stuff inside the for (order... loop into separate functions.
>>>>>> And
>>>>>> then
>>>>>> remove the if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP)
>>>>>> from
>>>>>> the
>>>>>> pool.
>>>>>>
>>>>>> In other words you trigger the back restore by calling a
>>>>>> different
>>>>>> function than the allocation one.
>>>>> I'll take a look at this as well.
>>>> Ah, and BTW: It's perfectly possible that ttm_tt_free() is called
>>>> because a halve swapped TT is about to be destroyed!
>>>>
>>>> If I'm not completely mistaken that is not handled gracefully
>>>> when we
>>>> try to always backup from in the ttm_tt_free() function.
>>>>
>>>> So we clearly need the separation of move this TT to a backup
>>>> (and
>>>> eventually only partially) and freeing it.
>>> Hm. I'm not sure I follow completely.
>>>
>>> The ttm_pool interface is currently:
>>>
>>> ttm_pool_alloc() -> allocs and may recover from backup. May leave
>>> partially backed up. Called from ttm_tt_populate() or its driver
>>> callbacks.
>> Yeah that this is done by a single function looks really strange to
>> me.
>>
>>> ttm_pool_backup_tt() -> Attempts to back up (the not already backed
>>> up
>>> part of a tt. Called from ttm_tt_backup(), which is just a tt layer
>>> wrapper. If called with purge==true, then frees memory bypassing
>>> the
>>> pool to return it to the system directly.
>>>
>>> ttm_pool_free() -> Frees a (potentially backed up or partially
>>> backed
>>> up) tt. Called from ttm_tt_unpopulate() or its driver callbacks.
>> Ah! I missed that you have separated that functionality from the free
>> path.
>>
>> I've only saw the allocation path and though I need to clear that up
>> first.
>>
>>> So the backup functionality is implemented with a minimal change to
>>> upper layers, and I don't think there is a correctness problem on
>>> free().
>>>
>>> So could you clarify a bit if it is this interface you think needs
>>> changing or that the implementation is better at separating out the
>>> backup functionality from the pool functionality?
>> I think we should just make the ttm pool object take charge of
>> allocation, backup, restore and free operation on the TT objects.
>>
>> And all of those are separate operations, they just internally share
>> steps to archive what they should do.
> So are we looking at an interface change like:
>
> ttm_pool_alloc() // no recover. Errors if backed-up-data present.
> ttm_pool_alloc_and_recover() // because you can't alloc first and then
> recover in a memory-efficient manner, since you need to interleave.
> ttm_pool_backup() // as currently
> ttm_pool_drop_backed_up() //drops the backed-up data if any.
> ttm_pool_free() // frees all data. errors if backed-up-data present.
>
> Is this what you mean?

Yes, exactly that.

>
>> BTW I really dislike that tt->restore is allocated dynamically. That
>> is
>> just another allocation which can cause problems.
>> We should probably have all the state necessary for the operation in
>> the
>> TT object.
> Initially it was done this way. But that meant a pre-allocated struct
> page-pointer array the of 1 << MAX_PAGE_ORDER size (2MiB) for each
> ttm_tt. That lead to a patch to reduce the MAX_PAGE_ORDER to PMD size
> order, but  as you might remember, that needed to be ripped out because
> the PMD size macros aren't constant across all architectures. IIRC it
> was ARM causing compilation failures, and Linus wasn't happy.

Yeah, I do remember that. But I don't fully get why you need this 
page-pointer array in the first place?

>
> So, enter the dynamic allocation which is temporary, and 1/512 of the
> size of the memory we need to allocate for the buffer object. IIRC that
> was discussed with Matt when he reiewed and we concluded that it should
> be ok. I think this approach leads to less memory pressure than if we'd
> keep that array around all the time for *all* the allocated bos, and
> the allocation is never during reclaim.

Hui? How do you avoid having to allocate that during reclaim?

I absolutely don't see that on the code currently.

Regards,
Christian.

>
> Thanks,
> Thomas
>
>
>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>
>>>
>>>> Christian.
>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>>> For a renaming change that touch all TTM drivers, I'd
>>>>>>> rather
>>>>>>> put
>>>>>>> that
>>>>>>> as a last patch since getting acks for that from all TTM
>>>>>>> driver
>>>>>>> maintainers seems like a hopeless undertaking.
>>>>>> Yeah the acks are not the problem, merging it through the xe
>>>>>> tree
>>>>>> would be.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> /Thomas