[PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

Alistair Popple apopple at nvidia.com
Wed Jan 12 11:06:03 UTC 2022

I have been looking at this in relation to the migration code and noticed we
have the following in try_to_migrate():

        if (is_zone_device_page(page) && !is_device_private_page(page))

Which if I'm understanding correctly means that migration of device coherent
pages will always fail. Given that I do wonder how hmm-tests are passing, but
I assume you must always be hitting this fast path in

                 * Optimize for the common case where page is only mapped once
                 * in one process. If we can lock the page, then we can safely
                 * set up a special migration page table entry now.

Meaning that try_to_migrate() never gets called from migrate_vma_unmap(). So
you will also need some changes to try_to_migrate() and possibly
try_to_migrate_one() to make this reliable.

 - Alistair

On Tuesday, 11 January 2022 9:31:51 AM AEDT Alex Sierra wrote:
> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> owned by a device that can be mapped into CPU page tables like
> MEMORY_DEVICE_GENERIC and can also be migrated like
> Christoph, the suggestion to incorporate Ralph Campbell’s refcount
> cleanup patch into our hardware page migration patchset originally came
> from you, but it proved impractical to do things in that order because
> the refcount cleanup introduced a bug with wide ranging structural
> implications. Instead, we amended Ralph’s patch so that it could be
> applied after merging the migration work. As we saw from the recent
> discussion, merging the refcount work is going to take some time and
> cooperation between multiple development groups, while the migration
> work is ready now and is needed now. So we propose to merge this
> patchset first and continue to work with Ralph and others to merge the
> refcount cleanup separately, when it is ready.
> This patch series is mostly self-contained except for a few places where
> it needs to update other subsystems to handle the new memory type.
> System stability and performance are not affected according to our
> ongoing testing, including xfstests.
> How it works: The system BIOS advertises the GPU device memory
> (aka VRAM) as SPM (special purpose memory) in the UEFI system address
> map.
> The amdgpu driver registers the memory with devmap as
> MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
> this hardware page migration capability is the Frontier supercomputer
> project. This functionality is not AMD-specific. We expect other GPU
> vendors to find this functionality useful, and possibly other hardware
> types in the future.
> Our test nodes in the lab are similar to the Frontier configuration,
> with .5 TB of system memory plus 256 GB of device memory split across
> 4 GPUs, all in a single coherent address space. Page migration is
> expected to improve application efficiency significantly. We will
> report empirical results as they become available.
> We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This
> patch set builds on HMM and our SVM memory manager already merged in
> 5.15.
> v2:
> - test_hmm is now able to create private and coherent device mirror
> instances in the same driver probe. This adds more usability to the hmm
> test by not having to remove the kernel module for each device type
> test (private/coherent type). This is done by passing the module
> parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create
> four instances of device_mirror. The first two correspond to private
> device type, the last two to coherent type. Then, they can be easily
> accessed from user space through /dev/hmm_mirror<num_device>. Usually
> num_device 0 and 1 are for private, and 2 and 3 for coherent types.
> - Coherent device type pages at gup are now migrated back to system
> memory if they have been long term pinned (FOLL_LONGTERM). The reason
> is these pages could eventually interfere with their own device memory
> manager. A new hmm_gup_test has been added to the hmm-test to test this
> functionality. It makes use of the gup_test module to long term pin
> user pages that have been migrate to device memory first.
> - Other patch corrections made by Felix, Alistair and Christoph.
> v3:
> - Based on last v2 feedback we got from Alistair, we've decided to
> remove migration logic for FOLL_LONGTERM coherent device type pages at
> gup for now. Ideally, this should be done through the kernel mm,
> instead of calling the device driver to do it. Currently, there's no
> support for migrating device pages based on pfn, mainly because
> migrate_pages() relies on pages being LRU pages. Alistair mentioned, he
> has started to work on adding this migrate device pages logic. For now,
> we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT
> pages.
> - Also, hmm_gup_test has been removed from hmm-test. We plan to include
> it again after this migration work is ready.
> - Addressed Liam Howlett's feedback changes.
> Alex Sierra (10):
>   mm: add zone device coherent type memory support
>   mm: add device coherent vma selection for memory migration
>   mm/gup: fail get_user_pages for LONGTERM dev coherent type
>   drm/amdkfd: add SPM support for SVM
>   drm/amdkfd: coherent type as sys mem on migration to ram
>   lib: test_hmm add ioctl to get zone device type
>   lib: test_hmm add module param for zone device type
>   lib: add support for device coherent type in test_hmm
>   tools: update hmm-test to support device coherent type
>   tools: update test_hmm script to support SP config
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  34 ++-
>  include/linux/memremap.h                 |   8 +
>  include/linux/migrate.h                  |   1 +
>  include/linux/mm.h                       |  16 ++
>  lib/test_hmm.c                           | 333 +++++++++++++++++------
>  lib/test_hmm_uapi.h                      |  22 +-
>  mm/gup.c                                 |   7 +
>  mm/memcontrol.c                          |   6 +-
>  mm/memory-failure.c                      |   8 +-
>  mm/memremap.c                            |   5 +-
>  mm/migrate.c                             |  30 +-
>  tools/testing/selftests/vm/hmm-tests.c   | 122 +++++++--
>  tools/testing/selftests/vm/test_hmm.sh   |  24 +-
>  13 files changed, 475 insertions(+), 141 deletions(-)

More information about the amd-gfx mailing list