[PATCH 00/44] Add HMM-based SVM memory manager to KFD v2

Felix Kuehling Felix.Kuehling at amd.com
Mon Mar 22 10:58:16 UTC 2021


Since the last patch series I sent on Jan 6 a lot has changed. Patches 1-33
are the cleaned up, rebased on amd-staging-drm-next 5.11 version from about
a week ago. The remaining 11 patches are current work-in-progress with
further cleanup and fixes.

MMU notifiers and CPU page faults now can split ranges and update our range
data structures without taking heavy locks by doing some of the critical
work in a deferred work handler. This includes updating MMU notifiers and
the SVM range interval tree. In the mean time, new ranges can live as
children of their parent ranges until the deferred work handler consolidates
them in the main interval tree.

We also added proper DMA mapping of system memory pages.

Current work in progress is cleaning up all the locking, simplifying our
code and data structures and resolving a few known bugs.

This series and the corresponding ROCm Thunk and KFDTest changes are also
available on gitub:
  https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/fxkamd/hmm-wip
  https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/fxkamd/hmm-wip

An updated Thunk

Alex Sierra (10):
  drm/amdgpu: replace per_device_list by array
  drm/amdkfd: helper to convert gpu id and idx
  drm/amdkfd: add xnack enabled flag to kfd_process
  drm/amdkfd: add ioctl to configure and query xnack retries
  drm/amdgpu: enable 48-bit IH timestamp counter
  drm/amdkfd: SVM API call to restore page tables
  drm/amdkfd: add svm_bo reference for eviction fence
  drm/amdgpu: add param bit flag to create SVM BOs
  drm/amdgpu: svm bo enable_signal call condition
  drm/amdgpu: add svm_bo eviction to enable_signal cb

Felix Kuehling (22):
  drm/amdkfd: map svm range to GPUs
  drm/amdkfd: svm range eviction and restore
  drm/amdkfd: validate vram svm range from TTM
  drm/amdkfd: HMM migrate ram to vram
  drm/amdkfd: HMM migrate vram to ram
  drm/amdkfd: invalidate tables on page retry fault
  drm/amdkfd: page table restore through svm API
  drm/amdkfd: add svm_bo eviction mechanism support
  drm/amdkfd: refine migration policy with xnack on
  drm/amdkfd: add svm range validate timestamp
  drm/amdkfd: multiple gpu migrate vram to vram
  drm/amdkfd: Fix dma unmapping
  drm/amdkfd: Call mutex_destroy
  drm/amdkfd: Fix spurious restore failures
  drm/amdkfd: Fix svm_bo_list locking in eviction worker
  drm/amdkfd: Simplify split_by_granularity
  drm/amdkfd: Point out several race conditions
  drm/amdkfd: Return pdd from kfd_process_device_from_gduid
  drm/amdkfd: Remove broken deferred mapping
  drm/amdkfd: Allow invalid pages in migration.src
  drm/amdkfd: Correct locking during migration and mapping
  drm/amdkfd: Nested locking and invalidation of child ranges

Philip Yang (12):
  drm/amdkfd: add svm ioctl API
  drm/amdkfd: register svm range
  drm/amdkfd: add svm ioctl GET_ATTR op
  drm/amdgpu: add common HMM get pages function
  drm/amdkfd: validate svm range system memory
  drm/amdkfd: deregister svm range
  drm/amdgpu: export vm update mapping interface
  drm/amdkfd: register HMM device private zone
  drm/amdkfd: support xgmi same hive mapping
  drm/amdkfd: copy memory through gart table
  drm/amdgpu: reserve fence slot to update page table
  drm/amdkfd: Add SVM API support capability bits

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    4 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  |   16 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c        |   83 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h        |    7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   90 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   48 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   11 +
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c        |    1 +
 drivers/gpu/drm/amd/amdkfd/Kconfig            |    1 +
 drivers/gpu/drm/amd/amdkfd/Makefile           |    4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  173 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |    4 +
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c        |    8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  922 ++++++
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h      |   59 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   54 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  191 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |    6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 2865 +++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h          |  175 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |    6 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   10 +-
 include/uapi/linux/kfd_ioctl.h                |  171 +-
 26 files changed, 4681 insertions(+), 249 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_svm.h

-- 
2.31.0



More information about the amd-gfx mailing list