No subject
David Hildenbrand
david at redhat.com
Mon Aug 25 19:10:08 UTC 2025
On 21.08.25 10:10, Christian König wrote:
> On 20.08.25 17:23, David Hildenbrand wrote:
>> CCing Lorenzo
>>
>> On 20.08.25 16:33, Christian König wrote:
>>> Hi everyone,
>>>
>>> sorry for CCing so many people, but that rabbit hole turned out to be
>>> deeper than originally thought.
>>>
>>> TTM always had problems with UC/WC mappings on 32bit systems and drivers
>>> often had to revert to hacks like using GFP_DMA32 to get things working
>>> while having no rational explanation why that helped (see the TTM AGP,
>>> radeon and nouveau driver code for that).
>>>
>>> It turned out that the PAT implementation we use on x86 not only enforces
>>> the same caching attributes for pages in the linear kernel mapping, but
>>> also for highmem pages through a separate R/B tree.
>>>
>>> That was unexpected and TTM never updated that R/B tree for highmem pages,
>>> so the function pgprot_set_cachemode() just overwrote the caching
>>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially
>>> caused all kind of random trouble.
>>>
>>> An R/B tree is potentially not a good data structure to hold thousands if
>>> not millions of different attributes for each page, so updating that is
>>> probably not the way to solve this issue.
>>>
>>> Thomas pointed out that the i915 driver is using apply_page_range()
>>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and
>>> just fill in the page tables with what the driver things is the right
>>> caching attribute.
>>
>> I assume you mean apply_to_page_range() -- same issue in patch subjects.
>
> Oh yes, of course. Sorry.
>
>> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :(
>
> Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach.
>
>> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better".
>
> Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble.
>
> The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other.
>
> What I don't understand is why do we have the PAT in the first place? No other architecture does it this way.
Probably because no other architecture has these weird glitches I assume
... skimming over memtype_reserve() and friends there are quite some
corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...)
I did a lot of work on the higher PAT level functions, but I am no
expert on the lower level management functions, and in particular all
the special cases with different memory types.
IIRC, the goal of the PAT subsystem is to make sure that no two page
tables map the same PFN with different caching attributes.
It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a
special way: no special caching mode.
For everything else, it expects that someone first reserves a memory
range for a specific caching mode.
For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve()
will make sure that there are no conflicts, to the call
memtype_kernel_map_sync() to make sure the identity mapping is updated
to the new type.
In case someone ends up calling pfnmap_setup_cachemode(), the
expectation is that there was a previous call to memtype_reserve_io() or
similar, such that pfnmap_setup_cachemode() will find that caching mode.
So my assumption would be that that is missing for the drivers here?
Last time I asked where this reservation is done, Peter Xu explained [1]
it at least for VFIO:
vfio_pci_core_mmap
pci_iomap
pci_iomap_range
...
__ioremap_caller
memtype_reserve
Now, could it be that something like that is missing in these drivers
(ioremap etc)?
[1] https://lkml.kernel.org/r/aBDXr-Qp4z0tS50P@x1.local
>
> Is that because of the of x86 CPUs which have problems when different page tables contain different caching attributes for the same physical memory?
Yes, but I don't think x86 is special here.
--
Cheers
David / dhildenb
More information about the Intel-xe
mailing list