drm/ttm: new TT backend allocation pool

Mon Oct 26 13:45:12 UTC 2020

On Sun, Oct 25, 2020 at 11:40:47PM +0800, Christian König wrote:
> This replaces the spaghetti code in the two existing page pools.
>     
> First of all depending on the allocation size it is between 3 (1GiB) and
> 5 (1MiB) times faster than the old implementation.
>     
> It makes better use of buddy pages to allow for larger physical contiguous
> allocations which should result in better TLB utilization at least for amdgpu.
>     
> Instead of a completely braindead approach of filling the pool with one CPU
> while another one is trying to shrink it we only give back freed pages.
>     
> This also results in much less locking contention and a trylock free MM
> shrinker callback, so we can guarantee that pages are given back to the system
> when needed.
>     
> Downside of this is that it takes longer for many small allocations until the
> pool is filled up. We could address this, but I couldn't find an use case
> where this actually matters. And we don't bother freeing large chunks of pages
> any more.
>     
> The sysfs files are replaced with a single module parameter, allowing users to
> override how many pages should be globally pooled in TTM. This unfortunately
> breaks the UAPI slightly, but as far as we know nobody ever depended on this.
>    
> Zeroing memory coming from the pool was handled inconsistently. The
> alloc_pages() based pool was zeroing it, the dma_alloc_attr() based one wasn't.
> The new implementation isn't zeroing pages from the pool either and only sets
> the __GFP_ZERO flag when necessary.
>     
> The implementation has only 753 lines of code compared to the over 2600 of the
> old one, and also allows for saving quite a bunch of code in the drivers since
> we don't need specialized handling there any more based on kernel config.
>   
> Additional to all of that there was a neat bug with IOMMU, coherent DMA
> mappings and huge pages which is now fixed in the new code as well.

Let me give a test your patch set in my renoir and new vangogh platform
with iommu enabled. (I actually encountered many page faults in legacy
ttm_page_alloc_dma implementation with swiotlb=1)

Thanks,
Ray