[PATCH] drm/ttm: WIP limit the TTM pool to 32bit CPUs
Christian König
christian.koenig at amd.com
Mon Aug 11 11:51:12 UTC 2025
On 07.08.25 18:47, Thomas Hellström wrote:
>> Well it's surprising that even modern CPU do stuff like that. That
>> could explain some of the problems we had with uncached mappings on
>> ARM and RISC-V.
>
> Yeah. I agree. Need to double-check with HW people whether that is gone
> with Panther Lake. Don't have a confirmation yet on that.
Going to ask around AMD internally as well.
> If it wasn't for the writeback of speculative prefetches we could've
> settled for only have TTM map the user-space mappings without changing
> the kernel map, just like the i915 driver does for older GPUS.
We should probably come up with a CPU whitelist or blacklist where that is actually needed.
>>> Second, IIRC vm_insert_pfn_prot() on X86 will override the given
>>> caching mode with the last caching mode set for the kernel linear
>>> map,
>>> so if you try to set up a write-combined GPU mapping without a
>>> previous
>>> call to set_pages_xxxxx it will actually end up cached. see
>>> track_pfn_insert().
>>
>> That is exactly the same incorrect assumption I made as well.
>>
>> It's not the linear mapping where that comes from but a separate page
>> attribute table, see /sys/kernel/debug/x86/pat_memtype_list.
>>
>> Question is why the heck should we do this? I mean we keep an extra
>> rb tree around to overwrite something the driver knows in the first
>> place?
>>
>> That is basically just tons of extra overhead for nothing as far as I
>> can see.
>
> IIRC it was PAT people enforcing the x86 documentation that aliased
> mappings with conflicting caching attributes were not allowed. But it
> has proven to work at least on those CPUs not suffering from the clean
> cache-line writeback mentioned above.
Makes sense. With the PAT handling even accessing things through /dev/mem gives you the right caching.
Do you have a list of Intel CPUs where it works?
> FWIW If I understand the code correctly, i915 bypasses this by setting
> up user-space mappings not by vm_insert_pfn_prot() but using
> apply_to_page_range(), mapping the whole bo.
Yeah, that's probably not something we can do. Even filling in 2MiB of address space at a time caused performance problems for TTM.
We should probably just drop overriding the attributes in vmf_insert_pfn_prot().
Regards,
Christian.
>
> /Thomas
>
>
>>
>> Thanks for taking a look,
>> Christian.
>>
>>>
>>> /Thomas
>>>
>>>
>>>> ---
>>>> drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++-----
>>>> 1 file changed, 11 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>>> index baf27c70a419..7487eac29398 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>>> @@ -38,7 +38,7 @@
>>>> #include <linux/highmem.h>
>>>> #include <linux/sched/mm.h>
>>>>
>>>> -#ifdef CONFIG_X86
>>>> +#ifdef CONFIG_X86_32
>>>> #include <asm/set_memory.h>
>>>> #endif
>>>>
>>>> @@ -46,6 +46,7 @@
>>>> #include <drm/ttm/ttm_pool.h>
>>>> #include <drm/ttm/ttm_tt.h>
>>>> #include <drm/ttm/ttm_bo.h>
>>>> +#include <drm/drm_cache.h>
>>>>
>>>> #include "ttm_module.h"
>>>>
>>>> @@ -192,7 +193,7 @@ static void ttm_pool_free_page(struct
>>>> ttm_pool
>>>> *pool, enum ttm_caching caching,
>>>> struct ttm_pool_dma *dma;
>>>> void *vaddr;
>>>>
>>>> -#ifdef CONFIG_X86
>>>> +#ifdef CONFIG_X86_32
>>>> /* We don't care that set_pages_wb is inefficient here.
>>>> This
>>>> is only
>>>> * used when we have to shrink and CPU overhead is
>>>> irrelevant then.
>>>> */
>>>> @@ -218,7 +219,7 @@ static void ttm_pool_free_page(struct
>>>> ttm_pool
>>>> *pool, enum ttm_caching caching,
>>>> /* Apply any cpu-caching deferred during page allocation */
>>>> static int ttm_pool_apply_caching(struct ttm_pool_alloc_state
>>>> *alloc)
>>>> {
>>>> -#ifdef CONFIG_X86
>>>> +#ifdef CONFIG_X86_32
>>>> unsigned int num_pages = alloc->pages - alloc-
>>>>> caching_divide;
>>>>
>>>> if (!num_pages)
>>>> @@ -232,6 +233,11 @@ static int ttm_pool_apply_caching(struct
>>>> ttm_pool_alloc_state *alloc)
>>>> case ttm_uncached:
>>>> return set_pages_array_uc(alloc->caching_divide,
>>>> num_pages);
>>>> }
>>>> +
>>>> +#elif defined(CONFIG_X86_64)
>>>> + unsigned int num_pages = alloc->pages - alloc-
>>>>> caching_divide;
>>>> +
>>>> + drm_clflush_pages(alloc->caching_divide, num_pages);
>>>> #endif
>>>> alloc->caching_divide = alloc->pages;
>>>> return 0;
>>>> @@ -342,7 +348,7 @@ static struct ttm_pool_type
>>>> *ttm_pool_select_type(struct ttm_pool *pool,
>>>> if (pool->use_dma_alloc)
>>>> return &pool->caching[caching].orders[order];
>>>>
>>>> -#ifdef CONFIG_X86
>>>> +#ifdef CONFIG_X86_32
>>>> switch (caching) {
>>>> case ttm_write_combined:
>>>> if (pool->nid != NUMA_NO_NODE)
>>>> @@ -980,7 +986,7 @@ long ttm_pool_backup(struct ttm_pool *pool,
>>>> struct ttm_tt *tt,
>>>> pool->use_dma_alloc || ttm_tt_is_backed_up(tt))
>>>> return -EBUSY;
>>>>
>>>> -#ifdef CONFIG_X86
>>>> +#ifdef CONFIG_X86_32
>>>> /* Anything returned to the system needs to be cached.
>>>> */
>>>> if (tt->caching != ttm_cached)
>>>> set_pages_array_wb(tt->pages, tt->num_pages);
>>>
>>
>
More information about the dri-devel
mailing list