[PATCH] mm/memremap: Introduce pgmap_request_folio() using pgmap offsets

Dan Williams dan.j.williams at intel.com
Thu Oct 20 23:17:36 UTC 2022


Felix Kuehling wrote:
> Am 2022-10-20 um 17:56 schrieb Dan Williams:
> > A 'struct dev_pagemap' (pgmap) represents a collection of ZONE_DEVICE
> > pages. The pgmap is a reference counted object that serves a similar
> > role as a 'struct request_queue'. Live references are obtained for each
> > in flight request / page, and once a page's reference count drops to
> > zero the associated pin of the pgmap is dropped as well. While a page is
> > idle nothing should be accessing it because that is effectively a
> > use-after-free situation. Unfortunately, all current ZONE_DEVICE
> > implementations deploy a layering violation to manage requests to
> > activate pages owned by a pgmap. Specifically, they take steps like walk
> > the pfns that were previously assigned at memremap_pages() time and use
> > pfn_to_page() to recall metadata like page->pgmap, or make use of other
> > data like page->zone_device_data.
> >
> > The first step towards correcting that situation is to provide a
> > API to get access to a pgmap page that does not require the caller to
> > know the pfn, nor access any fields of an idle page. Ideally this API
> > would be able to support dynamic page creation instead of the current
> > status quo of pre-allocating and initializing pages.
> 
> If I understand it correctly, the current code works because the struct 
> pages are pre-allocated. Is the end-goal here to make the struct page 
> allocation for ZONE_DEVICE pages dynamic.

Some DEVICE_PRIVATE users have already open-coded their own coarse
grained dynamic ZONE_DEVICE pages by waiting to allocate chunks on
demand.

The users that would benefit from a general dynamic ZONE_DEVICE facility
are cases like VMs backed by device-dax instances. Unless the VM calls
for bare metal services there is no need to map pages for the device-dax
instance in the hypervisor.

So, the end goal here is to just add some sanity to ZONE_DEVICE page
reference counting to allow for requiring an arbitration for page access
rather than just pfn_to_page() and assuming the page is already there.
Dynamic ZONE_DEVICE becomes something that is possible once that sanity
is in place.

> > On a prompt from Jason, introduce pgmap_request_folio() that operates on
> > an offset into a pgmap.
> 
> This looks like it would make it fairly easy to request larger (higher 
> order) folios for physically contiguous device memory allocations in the 
> future.
> 
> 
> >   It replaces the shortlived
> > pgmap_request_folios() that was continuing the layering violation of
> > assuming pages are available to be consulted before asking the pgmap to
> > make them available.
> >
> > For now this only converts the callers to lookup the pgmap and generate
> > the pgmap offset, but it does not do the deeper cleanup of teaching
> > those call sites to generate those arguments without walking the page
> > metadata. For next steps it appears the DEVICE_PRIVATE implementations
> > could plumb the pgmap into the necessary callsites and switch to using
> > gen_pool_alloc() to track which offsets of a pgmap are allocated.
> 
> Wouldn't that duplicate whatever device memory allocator we already have 
> in our driver? Couldn't I just take the memory allocation from our TTM 
> allocator and make necessary pgmap_request_folio calls to allocate the 
> corresponding pages from the pgmap?

I think you could, as long as the output from that allocator is a
pgmap_offset rather than a pfn.

> Or does the pgmap allocation need a finer granularity than the device
> memory allocation?

I would say the pgmap *allocation* happens at memremap_pages() time.
pgmap_request_folio() is a request to put a pgmap page into service.

So, yes, I think you can bring your own allocator for what offsets are
in/out of service in pgmap space.


More information about the dri-devel mailing list