[PATCH 4/4] drm/ttm: optimize ttm pool shrinker a bit

Thu Jan 7 12:49:45 UTC 2021

Am 22.12.20 um 14:51 schrieb Daniel Vetter:
> On Fri, Dec 18, 2020 at 06:55:38PM +0100, Christian König wrote:
>> Only initialize the DMA coherent pools if they are used.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
> Ah, just realized the answer to my question on patch 2: The pools are
> per-device, due to dma_alloc_coherent being per-device (but really mostly
> it isn't, but that's what we have to deal with fighting the dma-api
> abstraction).
>
> I think this would make a lot more sense if the shrinkers are per-pool
> (and also most of the debugfs files), since as-is in a multi-gpu system
> the first gpu's pool gets preferrentially thrashed. Which isn't a nice
> design. Splitting that into per gpu shrinkers means we get equal shrinking
> without having to maintain a global lru. This is how xfs seems to set up
> their shrinkers, and in general xfs people have a solid understanding of
> this stuff.

Well fairness and not trashing the first GPUs pool is the reason why I 
implemented just one shrinker plus a global LRU.

In other words shrink_slab() just uses list_for_each_entry() on all 
shrinkers.

In the pool shrinker callback shrink one pool and move it to the end of 
the shrinker list.

>
> Aside: I think it also would make tons of sense to split up your new ttm
> bo shrinker up into a per-device lru, and throw the global system memory
> lru out the window completely :-) Assuming we can indeed get rid of it,
> and vmwgfx doesn't need it somewhere still.

Yeah, I already have that as a patch set here, but I have this dependent 
on a larger rename of the device structures.

> Aside from this lgtm, but I guess will change a bit with that shuffling.

Thanks for the review, going to send out a new version with the 
fs_reclaim_acquire/release added in a minute.

Christian.

> -Daniel
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c | 23 ++++++++++++++++-------
>>   1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>> index 1cdacd58753a..f09e34614226 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -504,10 +504,12 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>>   	pool->use_dma_alloc = use_dma_alloc;
>>   	pool->use_dma32 = use_dma32;
>>   
>> -	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> -		for (j = 0; j < MAX_ORDER; ++j)
>> -			ttm_pool_type_init(&pool->caching[i].orders[j],
>> -					   pool, i, j);
>> +	if (use_dma_alloc) {
>> +		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> +			for (j = 0; j < MAX_ORDER; ++j)
>> +				ttm_pool_type_init(&pool->caching[i].orders[j],
>> +						   pool, i, j);
>> +	}
>>   }
>>   EXPORT_SYMBOL(ttm_pool_init);
>>   
>> @@ -523,9 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>   {
>>   	unsigned int i, j;
>>   
>> -	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> -		for (j = 0; j < MAX_ORDER; ++j)
>> -			ttm_pool_type_fini(&pool->caching[i].orders[j]);
>> +	if (pool->use_dma_alloc) {
>> +		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> +			for (j = 0; j < MAX_ORDER; ++j)
>> +				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>> +	}
>>   }
>>   EXPORT_SYMBOL(ttm_pool_fini);
>>   
>> @@ -630,6 +634,11 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>>   {
>>   	unsigned int i;
>>   
>> +	if (!pool->use_dma_alloc) {
>> +		seq_puts(m, "unused\n");
>> +		return 0;
>> +	}
>> +
>>   	ttm_pool_debugfs_header(m);
>>   
>>   	spin_lock(&shrinker_lock);
>> -- 
>> 2.25.1
>>