[PATCH v2 00/11] mm: rewrite pfnmap tracking and remove VM_PAT

Liam R. Howlett Liam.Howlett at oracle.com
Tue May 13 15:53:33 UTC 2025


* David Hildenbrand <david at redhat.com> [250512 08:34]:
> On top of mm-unstable.
> 
> VM_PAT annoyed me too much and wasted too much of my time, let's clean
> PAT handling up and remove VM_PAT.
> 
> This should sort out various issues with VM_PAT we discovered recently,
> and will hopefully make the whole code more stable and easier to maintain.
> 
> In essence: we stop letting PAT mode mess with VMAs and instead lift
> what to track/untrack to the MM core. We remember per VMA which pfn range
> we tracked in a new struct we attach to a VMA (we have space without
> exceeding 192 bytes), use a kref to share it among VMAs during
> split/mremap/fork, and automatically untrack once the kref drops to 0.

What you do here seems to be decouple the vma start/end addresses by
abstracting them into another allocated ref counted struct.  This is
close to what we do with the anon vma name..

It took a while to understand the underlying interval tree tracking of
this change, but I think it's as good as it was.  IIRC, there was a
shrinking and matching to the end address in the interval tree, but I
failed to find that commit and code - maybe it never made it upstream.
I was able to find a thread about splitting [1], so maybe I'm mistaken.

> 
> This implies that we'll keep tracking a full pfn range even after partially
> unmapping it, until fully unmapping it; but as that case was mostly broken
> before, this at least makes it work in a way that is least intrusive to
> VMA handling.
> 
> Shrinking with mremap() used to work in a hacky way, now we'll similarly
> keep the original pfn range tacked even after this form of partial unmap.
> Does anybody care about that? Unlikely. If we run into issues, we could
> likely handled that (adjust the tracking) when our kref drops to 1 while
> freeing a VMA. But it adds more complexity, so avoid that for now.

The decoupling of the vma and ref counted range means that we could beef
up the backend to support actually tracking the correct range, which
would be nice.. but I have very little desire to work on that.


[1] https://lore.kernel.org/all/5jrd43vusvcchpk2x6mouighkfhamjpaya5fu2cvikzaieg5pq@wqccwmjs4ian/

> 
> Briefly tested with the new pfnmap selftests [1].
> 
> [1] https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com

oh yes, that's still a pr_info() log.  I think that should be a pr_err()
at least?

> 
> Cc: Dave Hansen <dave.hansen at linux.intel.com>
> Cc: Andy Lutomirski <luto at kernel.org>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Ingo Molnar <mingo at redhat.com>
> Cc: Borislav Petkov <bp at alien8.de>
> Cc: "H. Peter Anvin" <hpa at zytor.com>
> Cc: Jani Nikula <jani.nikula at linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Tvrtko Ursulin <tursulin at ursulin.net>
> Cc: David Airlie <airlied at gmail.com>
> Cc: Simona Vetter <simona at ffwll.ch>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: Masami Hiramatsu <mhiramat at kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Cc: "Liam R. Howlett" <Liam.Howlett at oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
> Cc: Vlastimil Babka <vbabka at suse.cz>
> Cc: Jann Horn <jannh at google.com>
> Cc: Pedro Falcato <pfalcato at suse.de>
> Cc: Peter Xu <peterx at redhat.com>
> 
> v1 -> v2:
> * "mm: convert track_pfn_insert() to pfnmap_setup_cachemode*()"
>  -> Call it "pfnmap_setup_cachemode()" and improve the documentation
>  -> Add pfnmap_setup_cachemode_pfn()
>  -> Keep checking a single PFN for PMD/PUD case and document why it's ok
> * Merged memremap conversion patch with pfnmap_track() introduction patch
>  -> Improve documentation
> * "mm: convert VM_PFNMAP tracking to pfnmap_track() + pfnmap_untrack()"
>  -> Adjust to code changes in mm-unstable
> * Added "x86/mm/pat: inline memtype_match() into memtype_erase()"
> * "mm/io-mapping: track_pfn() -> "pfnmap tracking""
>  -> Adjust to code changes in mm-unstable
> 
> David Hildenbrand (11):
>   x86/mm/pat: factor out setting cachemode into pgprot_set_cachemode()
>   mm: convert track_pfn_insert() to pfnmap_setup_cachemode*()
>   mm: introduce pfnmap_track() and pfnmap_untrack() and use them for
>     memremap
>   mm: convert VM_PFNMAP tracking to pfnmap_track() + pfnmap_untrack()
>   x86/mm/pat: remove old pfnmap tracking interface
>   mm: remove VM_PAT
>   x86/mm/pat: remove strict_prot parameter from reserve_pfn_range()
>   x86/mm/pat: remove MEMTYPE_*_MATCH
>   x86/mm/pat: inline memtype_match() into memtype_erase()
>   drm/i915: track_pfn() -> "pfnmap tracking"
>   mm/io-mapping: track_pfn() -> "pfnmap tracking"
> 
>  arch/x86/mm/pat/memtype.c          | 194 ++++-------------------------
>  arch/x86/mm/pat/memtype_interval.c |  63 ++--------
>  drivers/gpu/drm/i915/i915_mm.c     |   4 +-
>  include/linux/mm.h                 |   4 +-
>  include/linux/mm_inline.h          |   2 +
>  include/linux/mm_types.h           |  11 ++
>  include/linux/pgtable.h            | 127 ++++++++++---------
>  include/trace/events/mmflags.h     |   4 +-
>  mm/huge_memory.c                   |   5 +-
>  mm/io-mapping.c                    |   2 +-
>  mm/memory.c                        |  86 ++++++++++---
>  mm/memremap.c                      |   8 +-
>  mm/mmap.c                          |   5 -
>  mm/mremap.c                        |   4 -
>  mm/vma_init.c                      |  50 ++++++++
>  15 files changed, 242 insertions(+), 327 deletions(-)
> 
> 
> base-commit: c68cfbc5048ede4b10a1d3fe16f7f6192fc2c9c8
> -- 
> 2.49.0
> 


More information about the Intel-gfx mailing list