[PATCH v5 01/13] mm: add zone device coherent type memory support

Alistair Popple apopple at nvidia.com
Mon Jun 20 08:13:26 UTC 2022


Oded Gabbay <oded.gabbay at gmail.com> writes:

> On Mon, Jun 20, 2022 at 3:33 AM Alistair Popple <apopple at nvidia.com> wrote:
>>
>>
>> Oded Gabbay <oded.gabbay at gmail.com> writes:
>>
>> > On Fri, Jun 17, 2022 at 8:20 PM Sierra Guiza, Alejandro (Alex)
>> > <alex.sierra at amd.com> wrote:
>> >>
>> >>
>> >> On 6/17/2022 4:40 AM, David Hildenbrand wrote:
>> >> > On 31.05.22 22:00, Alex Sierra wrote:
>> >> >> Device memory that is cache coherent from device and CPU point of view.
>> >> >> This is used on platforms that have an advanced system bus (like CAPI
>> >> >> or CXL). Any page of a process can be migrated to such memory. However,
>> >> >> no one should be allowed to pin such memory so that it can always be
>> >> >> evicted.
>> >> >>
>> >> >> Signed-off-by: Alex Sierra <alex.sierra at amd.com>
>> >> >> Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
>> >> >> Reviewed-by: Alistair Popple <apopple at nvidia.com>
>> >> >> [hch: rebased ontop of the refcount changes,
>> >> >>        removed is_dev_private_or_coherent_page]
>> >> >> Signed-off-by: Christoph Hellwig <hch at lst.de>
>> >> >> ---
>> >> >>   include/linux/memremap.h | 19 +++++++++++++++++++
>> >> >>   mm/memcontrol.c          |  7 ++++---
>> >> >>   mm/memory-failure.c      |  8 ++++++--
>> >> >>   mm/memremap.c            | 10 ++++++++++
>> >> >>   mm/migrate_device.c      | 16 +++++++---------
>> >> >>   mm/rmap.c                |  5 +++--
>> >> >>   6 files changed, 49 insertions(+), 16 deletions(-)
>> >> >>
>> >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> >> >> index 8af304f6b504..9f752ebed613 100644
>> >> >> --- a/include/linux/memremap.h
>> >> >> +++ b/include/linux/memremap.h
>> >> >> @@ -41,6 +41,13 @@ struct vmem_altmap {
>> >> >>    * A more complete discussion of unaddressable memory may be found in
>> >> >>    * include/linux/hmm.h and Documentation/vm/hmm.rst.
>> >> >>    *
>> >> >> + * MEMORY_DEVICE_COHERENT:
>> >> >> + * Device memory that is cache coherent from device and CPU point of view. This
>> >> >> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A
>> >> >> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory
>> >> >> + * type. Any page of a process can be migrated to such memory. However no one
>> >> > Any page might not be right, I'm pretty sure. ... just thinking about special pages
>> >> > like vdso, shared zeropage, ... pinned pages ...
>> >>
>> >> Hi David,
>> >>
>> >> Yes, I think you're right. This type does not cover all special pages.
>> >> I need to correct that on the cover letter.
>> >> Pinned pages are allowed as long as they're not long term pinned.
>> >>
>> >> Regards,
>> >> Alex Sierra
>> >
>> > What if I want to hotplug this device's coherent memory, but I do
>> > *not* want the OS
>> > to migrate any page to it ?
>> > I want to fully-control what resides on this memory, as I can consider
>> > this memory
>> > "expensive". i.e. I don't have a lot of it, I want to use it for
>> > specific purposes and
>> > I don't want the OS to start using it when there is some memory pressure in
>> > the system.
>>
>> This is exactly what MEMORY_DEVICE_COHERENT is for. Device coherent
>> pages are only allocated by a device driver and exposed to user-space by
>> a driver migrating pages to them with migrate_vma. The OS can't just
>> start using them due to memory pressure for example.
>>
>>  - Alistair
> Thanks for the explanation.
>
> I guess the commit message confused me a bit, especially these two sentences:
>
> "Any page of a process can be migrated to such memory. However no one should be
> allowed to pin such memory so that it can always be evicted."
>
> I read them as if the OS is free to choose which pages are migrated to
> this memory,
> and anything is eligible for migration to that memory (and that's why
> we also don't
> allow it to pin memory there).
>
> If we are not allowed to pin anything there, can the device driver
> decide to disable
> any option for oversubscription of this memory area ?

I'm not sure I follow your thinking on how oversubscription would work
here, however all allocations are controlled by the driver. So if a
device's coherent memory is full a driver would be unable to migrate
pages to that device until pages are freed by the OS due to being
unmapped or the driver evicts pages by migrating them back to normal CPU
memory.

Pinning of pages is allowed, and could prevent such migrations. However
this patch series prevents device coherent pages from being pinned
longterm (ie. with FOLL_LONGTERM), so it should always be able to evict
pages eventually.

> Let's assume the user uses this memory area for doing p2p with other
> CXL devices.
> In that case, I wouldn't want the driver/OS to migrate pages in and
> out of that memory...

The OS will not migrate pages in or out (although it may free them if no
longer required), but a driver might choose to. So at the moment it's
really up to the driver to implement what you want in this regards.

> So either I should let the user pin those pages, or prevent him from
> doing (accidently or not)
> oversubscription in this memory area.

As noted above pages can be pinned, but not long-term.

 - Alistair

> wdyt ?
>
>>
>> > Oded
>> >
>> >>
>> >> >
>> >> >> + * should be allowed to pin such memory so that it can always be evicted.
>> >> >> + *
>> >> >>    * MEMORY_DEVICE_FS_DAX:
>> >> >>    * Host memory that has similar access semantics as System RAM i.e. DMA
>> >> >>    * coherent and supports page pinning. In support of coordinating page
>> >> >> @@ -61,6 +68,7 @@ struct vmem_altmap {
>> >> >>   enum memory_type {
>> >> >>      /* 0 is reserved to catch uninitialized type fields */
>> >> >>      MEMORY_DEVICE_PRIVATE = 1,
>> >> >> +    MEMORY_DEVICE_COHERENT,
>> >> >>      MEMORY_DEVICE_FS_DAX,
>> >> >>      MEMORY_DEVICE_GENERIC,
>> >> >>      MEMORY_DEVICE_PCI_P2PDMA,
>> >> >> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio)
>> >> > In general, this LGTM, and it should be correct with PageAnonExclusive I think.
>> >> >
>> >> >
>> >> > However, where exactly is pinning forbidden?
>> >>
>> >> Long-term pinning is forbidden since it would interfere with the device
>> >> memory manager owning the
>> >> device-coherent pages (e.g. evictions in TTM). However, normal pinning
>> >> is allowed on this device type.
>> >>
>> >> Regards,
>> >> Alex Sierra
>> >>
>> >> >


More information about the amd-gfx mailing list