[PATCH] drm/ttm: stop warning on TT shrinker failure

Wed Mar 24 12:00:28 UTC 2021

Am 24.03.21 um 12:55 schrieb Daniel Vetter:
> On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
>> On 3/23/21 4:45 PM, Christian König wrote:
>>> Am 23.03.21 um 16:13 schrieb Michal Hocko:
>>>> On Tue 23-03-21 14:56:54, Christian König wrote:
>>>>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
>>>> [...]
>>>>>> Anyway, I am wondering whether the overall approach is
>>>>>> sound. Why don't
>>>>>> you simply use shmem as your backing storage from the
>>>>>> beginning and pin
>>>>>> those pages if they are used by the device?
>>>>> Yeah, that is exactly what the Intel guys are doing for their
>>>>> integrated
>>>>> GPUs :)
>>>>>
>>>>> Problem is for TTM I need to be able to handle dGPUs and those have all
>>>>> kinds of funny allocation restrictions. In other words I need to
>>>>> guarantee
>>>>> that the allocated memory is coherent accessible to the GPU
>>>>> without using
>>>>> SWIOTLB.
>>>>>
>>>>> The simple case is that the device can only do DMA32, but you also got
>>>>> device which can only do 40bits or 48bits.
>>>>>
>>>>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>>>>> changes (write back vs. write through, vs. uncached).
>>>> OK, so the underlying problem seems to be that gfp mask (thus
>>>> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
>>>> it help if shmem would allow to provide an allocation callback to
>>>> override alloc_page_vma which is used currently? I am pretty sure there
>>>> will be more to handle but going through shmem for the whole life time
>>>> is just so much easier to reason about than some tricks to abuse shmem
>>>> just for the swapout path.
>>> Well it's a start, but the pages can have special CPU cache settings. So
>>> direct IO from/to them usually doesn't work as expected.
>>>
>>> Additional to that for AGP and CMA I need to make sure that I give those
>>> pages back to the relevant subsystems instead of just dropping the page
>>> reference.
>>>
>>> So I would need to block for the swapio to be completed.
>>>
>>> Anyway I probably need to revert those patches for now since this isn't
>>> working as we hoped it would.
>>>
>>> Thanks for the explanation how stuff works here.
>> Another alternative here that I've tried before without being successful
>> would perhaps be to drop shmem completely and, if it's a normal page (no dma
>> or funny caching attributes) just use add_to_swap_cache()? If it's something
>> else, try alloc a page with relevant gfp attributes, copy and
>> add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
>> either?
> So before we toss everything and go an a great rewrite-the-world tour,
> what if we just try to split up big objects. So for objects which are
> bigger than e.g. 10mb
>
> - move them to a special "under eviction" list
> - keep a note how far we evicted thus far
> - interleave allocating shmem pages, copying data and releasing the ttm
>    backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
>
> If that's not enough, occasionally break out of the shrinker entirely so
> other parts of reclaim can reclaim the shmem stuff. But just releasing our
> own pages as we go should help a lot I think.

Yeah, the later is exactly what I was currently prototyping.

I just didn't used a limit but rather a only partially evicted BOs list 
which is used when we fail to allocate a page.

For the 5.12 cycle I think we should just go back to a hard 50% limit 
for now and then resurrect this when we have solved the issues.

Christian.

> -Daniel