[PATCH 03/12] mm/pagewalk: Skip dax pages in pagewalk
Alistair Popple
apopple at nvidia.com
Thu Jun 12 22:50:15 UTC 2025
On Thu, Jun 12, 2025 at 03:15:31PM +0100, Lorenzo Stoakes wrote:
> On Thu, May 29, 2025 at 04:32:04PM +1000, Alistair Popple wrote:
> > Previously dax pages were skipped by the pagewalk code as pud_special() or
> > vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages are
> > refcounted normally that is no longer the case, so add explicit checks to
> > skip them.
> >
> > Signed-off-by: Alistair Popple <apopple at nvidia.com>
> > ---
> > include/linux/memremap.h | 11 +++++++++++
> > mm/pagewalk.c | 12 ++++++++++--
> > 2 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > index 4aa1519..54e8b57 100644
> > --- a/include/linux/memremap.h
> > +++ b/include/linux/memremap.h
> > @@ -198,6 +198,17 @@ static inline bool folio_is_fsdax(const struct folio *folio)
> > return is_fsdax_page(&folio->page);
> > }
> >
> > +static inline bool is_devdax_page(const struct page *page)
> > +{
> > + return is_zone_device_page(page) &&
> > + page_pgmap(page)->type == MEMORY_DEVICE_GENERIC;
> > +}
> > +
> > +static inline bool folio_is_devdax(const struct folio *folio)
> > +{
> > + return is_devdax_page(&folio->page);
> > +}
> > +
> > #ifdef CONFIG_ZONE_DEVICE
> > void zone_device_page_init(struct page *page);
> > void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> > diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> > index e478777..0dfb9c2 100644
> > --- a/mm/pagewalk.c
> > +++ b/mm/pagewalk.c
> > @@ -884,6 +884,12 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> > * support PUD mappings in VM_PFNMAP|VM_MIXEDMAP VMAs.
> > */
> > page = pud_page(pud);
> > +
> > + if (is_devdax_page(page)) {
>
> Is it only devdax that can exist at PUD leaf level, not fsdax?
Correct.
> > + spin_unlock(ptl);
> > + goto not_found;
> > + }
> > +
> > goto found;
> > }
> >
> > @@ -911,7 +917,8 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> > goto pte_table;
> > } else if (pmd_present(pmd)) {
> > page = vm_normal_page_pmd(vma, addr, pmd);
> > - if (page) {
> > + if (page && !is_devdax_page(page) &&
> > + !is_fsdax_page(page)) {
> > goto found;
> > } else if ((flags & FW_ZEROPAGE) &&
> > is_huge_zero_pmd(pmd)) {
> > @@ -945,7 +952,8 @@ struct folio *folio_walk_start(struct folio_walk *fw,
> >
> > if (pte_present(pte)) {
> > page = vm_normal_page(vma, addr, pte);
> > - if (page)
> > + if (page && !is_devdax_page(page) &&
> > + !is_fsdax_page(page))
> > goto found;
> > if ((flags & FW_ZEROPAGE) &&
> > is_zero_pfn(pte_pfn(pte))) {
>
> I'm probably echoing others here (and I definitely particularly like Dan's
> suggestion of a helper function here, and Jason's suggestion of explanatory
> comments), but would also be nice to not have to do this separately at each page
> table level and instead have something that you can say 'get me normal non-dax
> page at page table level <parameter>'.
I did the filtering here because I was trying to avoid unintended behavioural
changes and was being lazy by not auditing the callers. Turns out naming is
harder than doing this properly so I'm going to go with Jason and David's
suggestion and drop the filtering entirely. It will then be up to callers to
define what is "normal" for them by filtering out folio types they don't care
about. Most already do filter out zone device folios or DAX VMA's anyway, and I
will add some commentary to this effect in the respin.
> > --
> > git-series 0.9.1
>
More information about the dri-devel
mailing list