[PATCH v7 00/42] drm/msm: sparse / "VM_BIND" support
Antonino Maniscalco
antomani103 at gmail.com
Sat Jun 28 13:32:01 UTC 2025
On 6/25/25 8:46 PM, Rob Clark wrote:
> Conversion to DRM GPU VA Manager[1], and adding support for Vulkan Sparse
> Memory[2] in the form of:
>
> 1. A new VM_BIND submitqueue type for executing VM MSM_SUBMIT_BO_OP_MAP/
> MAP_NULL/UNMAP commands
>
> 2. A new VM_BIND ioctl to allow submitting batches of one or more
> MAP/MAP_NULL/UNMAP commands to a VM_BIND submitqueue
>
> I did not implement support for synchronous VM_BIND commands. Since
> userspace could just immediately wait for the `SUBMIT` to complete, I don't
> think we need this extra complexity in the kernel. Synchronous/immediate
> VM_BIND operations could be implemented with a 2nd VM_BIND submitqueue.
>
> The corresponding mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533
>
> Changes in v7:
> - Rebase on, and use, gpuvm locking helpers[4], included in this
> series.
> - Various small fixes
> - Link to v6: https://lore.kernel.org/all/20250605183111.163594-1-robin.clark@oss.qualcomm.com/
>
> Changes in v6:
> - Drop io-pgtable-arm patch as it has already been picked up in the
> iommu tree.
> - Rework to drop gpuvm changes. To mitigate the limitation of gpuvm
> when it comes to lazy unmap (and to avoid ~5ms of unmap per pageflip!)
> a vma_ref refcount is added. This refcount is incremented when a BO
> is pinned for scanout, and for userspace handles and dma-bufs. The
> VMA is torn down when this count drops to zero, breaking the reference
> loop between the VM_BO and BO. But as long as a pin or userspace
> handle is keeping a reference to the BO live, we allow the harmless
> reference loop to live. (This is only for kernel managed VMs, which
> includes the kms VM.) If no userspace process has some sort of
> handle to the BO, it is unlikely to be reused again. (The exception
> is GET_FB, but in that case the vma_ref >= 1 due to pin for scan-
> out.)
> - Drop gpu sched changes for throttling and move this into the driver.
> We can re-visit a more generic solution when some other driver
> realizes they need the same thing.
> - Link to v5: https://lore.kernel.org/all/20250519175348.11924-1-robdclark@gmail.com/
>
> Changes in v5:
> - Improved drm/sched enqueue_credit comments, and better define the
> return from drm_sched_entity_push_job()
> - Improve DRM_GPUVM_VA_WEAK_REF comments, and additional WARN_ON()s to
> make it clear that some of the gpuvm functionality is not available
> in this mode.
> - Link to v4: https://lore.kernel.org/all/20250514175527.42488-1-robdclark@gmail.com/
>
> Changes in v4:
> - Various locking/etc fixes
> - Optimize the pgtable preallocation. If userspace sorts the VM_BIND ops
> then the kernel detects ops that fall into the same 2MB last level PTD
> to avoid duplicate page preallocation.
> - Add way to throttle pushing jobs to the scheduler, to cap the amount of
> potentially temporary prealloc'd pgtable pages.
> - Add vm_log to devcoredump for debugging. If the vm_log_shift module
> param is set, keep a log of the last 1<<vm_log_shift VM updates for
> easier debugging of faults/crashes.
> - Link to v3: https://lore.kernel.org/all/20250428205619.227835-1-robdclark@gmail.com/
>
> Changes in v3:
> - Switched to seperate VM_BIND ioctl. This makes the UABI a bit
> cleaner, but OTOH the userspace code was cleaner when the end result
> of either type of VkQueue lead to the same ioctl. So I'm a bit on
> the fence.
> - Switched to doing the gpuvm bookkeeping synchronously, and only
> deferring the pgtable updates. This avoids needing to hold any resv
> locks in the fence signaling path, resolving the last shrinker related
> lockdep complaints. OTOH it means userspace can trigger invalid
> pgtable updates with multiple VM_BIND queues. In this case, we ensure
> that unmaps happen completely (to prevent userspace from using this to
> access free'd pages), mark the context as unusable, and move on with
> life.
> - Link to v2: https://lore.kernel.org/all/20250319145425.51935-1-robdclark@gmail.com/
>
> Changes in v2:
> - Dropped Bibek Kumar Patro's arm-smmu patches[3], which have since been
> merged.
> - Pre-allocate all the things, and drop HACK patch which disabled shrinker.
> This includes ensuring that vm_bo objects are allocated up front, pre-
> allocating VMA objects, and pre-allocating pages used for pgtable updates.
> The latter utilizes io_pgtable_cfg callbacks for pgtable alloc/free, that
> were initially added for panthor.
> - Add back support for BO dumping for devcoredump.
> - Link to v1 (RFC): https://lore.kernel.org/dri-devel/20241207161651.410556-1-robdclark@gmail.com/T/#t
>
> [1] https://www.kernel.org/doc/html/next/gpu/drm-mm.html#drm-gpuvm
> [2] https://docs.vulkan.org/spec/latest/chapters/sparsemem.html
> [3] https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=909700
> [4] https://lore.kernel.org/all/20250620154537.89514-1-robin.clark@oss.qualcomm.com/
>
> Rob Clark (42):
> drm/gpuvm: Fix doc comments
> drm/gpuvm: Add locking helpers
> drm/gem: Add ww_acquire_ctx support to drm_gem_lru_scan()
> drm/msm: Rename msm_file_private -> msm_context
> drm/msm: Improve msm_context comments
> drm/msm: Rename msm_gem_address_space -> msm_gem_vm
> drm/msm: Remove vram carveout support
> drm/msm: Collapse vma allocation and initialization
> drm/msm: Collapse vma close and delete
> drm/msm: Don't close VMAs on purge
> drm/msm: Stop passing vm to msm_framebuffer
> drm/msm: Refcount framebuffer pins
> drm/msm: drm_gpuvm conversion
> drm/msm: Convert vm locking
> drm/msm: Use drm_gpuvm types more
> drm/msm: Split out helper to get iommu prot flags
> drm/msm: Add mmu support for non-zero offset
> drm/msm: Add PRR support
> drm/msm: Rename msm_gem_vma_purge() -> _unmap()
> drm/msm: Drop queued submits on lastclose()
> drm/msm: Lazily create context VM
> drm/msm: Add opt-in for VM_BIND
> drm/msm: Mark VM as unusable on GPU hangs
> drm/msm: Add _NO_SHARE flag
> drm/msm: Crashdump prep for sparse mappings
> drm/msm: rd dumping prep for sparse mappings
> drm/msm: Crashdump support for sparse
> drm/msm: rd dumping support for sparse
> drm/msm: Extract out syncobj helpers
> drm/msm: Use DMA_RESV_USAGE_BOOKKEEP/KERNEL
> drm/msm: Add VM_BIND submitqueue
> drm/msm: Support IO_PGTABLE_QUIRK_NO_WARN_ON
> drm/msm: Support pgtable preallocation
> drm/msm: Split out map/unmap ops
> drm/msm: Add VM_BIND ioctl
> drm/msm: Add VM logging for VM_BIND updates
> drm/msm: Add VMA unmap reason
> drm/msm: Add mmu prealloc tracepoint
> drm/msm: use trylock for debugfs
> drm/msm: Bump UAPI version
> drm/msm: Defer VMA unmap for fb unpins
> drm/msm: Add VM_BIND throttling
>
> drivers/gpu/drm/drm_gem.c | 14 +-
> drivers/gpu/drm/drm_gpuvm.c | 132 +-
> drivers/gpu/drm/msm/Kconfig | 1 +
> drivers/gpu/drm/msm/Makefile | 1 +
> drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 25 +-
> drivers/gpu/drm/msm/adreno/a2xx_gpummu.c | 5 +-
> drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 17 +-
> drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 17 +-
> drivers/gpu/drm/msm/adreno/a5xx_debugfs.c | 4 +-
> drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 22 +-
> drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +-
> drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 10 +-
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 32 +-
> drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +-
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 49 +-
> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 6 +-
> drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 10 +-
> drivers/gpu/drm/msm/adreno/adreno_device.c | 4 -
> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 23 +-
> .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c | 11 +-
> drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c | 20 +-
> drivers/gpu/drm/msm/disp/dpu1/dpu_formats.h | 3 +-
> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 18 +-
> drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 22 +-
> drivers/gpu/drm/msm/disp/dpu1/dpu_plane.h | 2 -
> drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 6 +-
> drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 28 +-
> drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 18 +-
> drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 4 +-
> drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 19 +-
> drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 18 +-
> drivers/gpu/drm/msm/dsi/dsi_host.c | 14 +-
> drivers/gpu/drm/msm/msm_drv.c | 185 +-
> drivers/gpu/drm/msm/msm_drv.h | 30 +-
> drivers/gpu/drm/msm/msm_fb.c | 33 +-
> drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
> drivers/gpu/drm/msm/msm_gem.c | 537 +++---
> drivers/gpu/drm/msm/msm_gem.h | 276 ++-
> drivers/gpu/drm/msm/msm_gem_prime.c | 66 +
> drivers/gpu/drm/msm/msm_gem_shrinker.c | 104 +-
> drivers/gpu/drm/msm/msm_gem_submit.c | 300 ++--
> drivers/gpu/drm/msm/msm_gem_vma.c | 1508 ++++++++++++++++-
> drivers/gpu/drm/msm/msm_gpu.c | 211 ++-
> drivers/gpu/drm/msm/msm_gpu.h | 147 +-
> drivers/gpu/drm/msm/msm_gpu_trace.h | 14 +
> drivers/gpu/drm/msm/msm_iommu.c | 302 +++-
> drivers/gpu/drm/msm/msm_kms.c | 18 +-
> drivers/gpu/drm/msm/msm_kms.h | 2 +-
> drivers/gpu/drm/msm/msm_mmu.h | 38 +-
> drivers/gpu/drm/msm/msm_rd.c | 62 +-
> drivers/gpu/drm/msm/msm_ringbuffer.c | 10 +-
> drivers/gpu/drm/msm/msm_submitqueue.c | 96 +-
> drivers/gpu/drm/msm/msm_syncobj.c | 172 ++
> drivers/gpu/drm/msm/msm_syncobj.h | 37 +
> include/drm/drm_gem.h | 10 +-
> include/drm/drm_gpuvm.h | 8 +
> include/uapi/drm/msm_drm.h | 149 +-
> 58 files changed, 3712 insertions(+), 1263 deletions(-)
> create mode 100644 drivers/gpu/drm/msm/msm_syncobj.c
> create mode 100644 drivers/gpu/drm/msm/msm_syncobj.h
>
I've been testing and helping debug this series:
Tested-by: Antonino Maniscalco <antomani103 at gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103 at gmail.com>
Best regards,
--
Antonino Maniscalco <antomani103 at gmail.com>
More information about the dri-devel
mailing list