[RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops

Matthew Wilcox willy at infradead.org
Wed Nov 13 04:57:28 UTC 2024


On Tue, Nov 12, 2024 at 03:22:46PM +0100, David Hildenbrand wrote:
> On 12.11.24 14:53, Jason Gunthorpe wrote:
> > On Tue, Nov 12, 2024 at 10:10:06AM +0100, David Hildenbrand wrote:
> > > On 12.11.24 06:26, Matthew Wilcox wrote:
> > > > I don't want you to respin.  I think this is a bad idea.
> > > 
> > > I'm hoping you'll find some more time to explain what exactly you don't
> > > like, because this series only refactors what we already have.
> > > 
> > > I enjoy seeing the special casing (especially hugetlb) gone from mm/swap.c.

I don't.  The list of 'if's is better than the indirect function call.
That's terribly expensive, and the way we reuse the lru.next field
is fragile.  Not to mention that it introduces a new thing for the
hardening people to fret over.

> > And, IMHO, seems like overkill. We have only a handful of cases -
> > maybe we shouldn't be trying to get to full generality but just handle
> > a couple of cases directly? I don't really think it is such a bad
> > thing to have an if ladder on the free path if we have only a couple
> > things. Certainly it looks good instead of doing overlaying tricks.
> 
> I'd really like to abstract hugetlb handling if possible. The way it stands
> it's just very odd.

There might be ways to make that better.  I haven't really been looking
too hard at making that special handling go away.

> We'll need some reliable way to identify these folios that need care.
> guest_memfd will be using folio->mapcount for now, so for now we couldn't
> set a page type like hugetlb does.

If hugetlb can set lru.next at a certain point, then guestmemfd could
set a page type at a similar point, no?

> > Also how does this translate to Matthew's memdesc world?

In a memdesc world, pages no longer have a refcount.  We might still
have put_page() which will now be a very complicated (and out-of-line)
function that looks up what kind of memdesc it is and operates on the
memdesc's refcount ... if it has one.  I don't know if it'll be exported
to modules; I can see uses in the mm code, but I'm not sure if modules
will have a need.

Each memdesc type will have its own function to call to free the memdesc.
So we'll still have folio_put().  But slab does not have, need nor want
a refcount, so it'll just slab_free().  I expect us to keep around a
list of recently-freed memdescs of a particular type with their pages
still attached so that we can allocate them again quickly (or reclaim
them under memory pressure).  Once that freelist overflows, we'll free
a batch of them to the buddy allocator (for the pages) and the slab
allocator (for the memdesc itself).

> guest_memfd and hugetlb would be operating on folios (at least for now),
> which contain the refcount,lru,private, ... so nothing special there.
> 
> Once we actually decoupled "struct folio" from "struct page", we *might*
> have to play less tricks, because we could just have a callback pointer
> there. But well, maybe we also want to save some space in there.
> 
> Do we want dedicated memdescs for hugetlb/guest_memfd that extend folios in
> the future? I don't know, maybe.

I've certainly considered going so far as a per-fs folio.  So we'd
have an ext4_folio, an btrfs_folio, an iomap_folio, etc.  That'd let us
get rid of folio->private, but I'm not sure that C's type system can
really handle this nicely.  Maybe in a Rust world ;-)

What I'm thinking about is that I'd really like to be able to declare
that all the functions in ext4_aops only accept pointers to ext4_folio,
so ext4_dirty_folio() can't be called with pointers to _any_ folio,
but specifically folios which were previously allocated for ext4.

I don't know if Rust lets you do something like that.

> I'm currently wondering if we can use folio->private for the time being.
> Either
> 
> (a) If folio->private is still set once the refcount drops to 0, it
> indicates that there is a freeing callback/owner_ops. We'll have to make
> hugetlb not use folio->private and convert others to clear folio->private
> before freeing.
> 
> (b) Use bitX of folio->private to indicate that this has "owner_ops"
> meaning. We'll have to make hugetlb not use folio->private and make others
> not use bitX. Might be harder and overkill, because right now we only really
> need the callback when refcount==0.
> 
> (c) Use some other indication that folio->private contains folio_ops.

I really don't want to use folio_ops / folio_owner_ops.  I read
https://lore.kernel.org/all/CAGtprH_JP2w-4rq02h_Ugvq5KuHX7TUvegOS7xUs_iy5hriE7g@mail.gmail.com/
and I still don't understand what you're trying to do.

Would it work to use aops->free_folio() to notify you when the folio is
being removed from the address space?


More information about the dri-devel mailing list