[v2 20/31] drm/xe: add xe lock document

Tue Apr 9 20:17:31 UTC 2024

This is not intended a complete documentation of xe locks. It
only documents some key locks used in xe driver and gives an
example to illustrate the lock usage.

This is just a start. We should eventually refine this document.

Signed-off-by: Oak Zeng <oak.zeng at intel.com>
---
 Documentation/gpu/xe/index.rst   |   1 +
 Documentation/gpu/xe/xe_lock.rst |   8 +++
 drivers/gpu/drm/xe/xe_lock_doc.h | 113 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |   2 +-
 4 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/xe/xe_lock.rst
 create mode 100644 drivers/gpu/drm/xe/xe_lock_doc.h

diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst
index 106b60aba1f0..6ae2c8e7bbb4 100644
--- a/Documentation/gpu/xe/index.rst
+++ b/Documentation/gpu/xe/index.rst
@@ -24,3 +24,4 @@ DG2, etc is provided to prototype the driver.
    xe_tile
    xe_debugging
    xe_svm
+   xe_lock
diff --git a/Documentation/gpu/xe/xe_lock.rst b/Documentation/gpu/xe/xe_lock.rst
new file mode 100644
index 000000000000..24e4c2e7c5d1
--- /dev/null
+++ b/Documentation/gpu/xe/xe_lock.rst
@@ -0,0 +1,8 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+==============
+xe lock design
+==============
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_lock_doc.h
+   :doc: xe lock design
diff --git a/drivers/gpu/drm/xe/xe_lock_doc.h b/drivers/gpu/drm/xe/xe_lock_doc.h
new file mode 100644
index 000000000000..0fab623ce056
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_lock_doc.h
@@ -0,0 +1,113 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef _XE_LOCK_DOC_H_
+#define _XE_LOCK_DOC_H_
+
+/**
+ * DOC: XE lock design
+ *
+ * Locks used in xekmd are complicated. This document try to document the
+ * very fundamentals, such as key locks  used, their purpose and the
+ * order of locking if you need to hold multiple locks.
+ *
+ * Locks used in xekmd
+ * ===================
+ * 1. xe_vm::lock
+ * xe_vm::lock is used mainly to protect data in xe_vm struct, more specifically
+ * this includes below:
+ *
+ * 1) vm::rebind_list
+ * 2) vm::flags, only XE_VM_FLA_BANNED bit
+ * 3) vma::tile_present
+ * 4) userptr::repin_list
+ * 5) userptr::invalidated list
+ * 6) vm::preempt::exec_queue
+ * 7) drm_gpuvm::rb list and tree
+ * 8) vm::size
+ * 9) vm::q[]->last_fence, only if q->flags' EXEC_QUEUE_FLAG_VM is set,
+ *    see xe_exec_queue_last_fence_lockdep_assert
+ * 10) a contested list during vm close. see xe_vm_close_and_put
+ *
+ * 2. mm mmap_lock
+ * mm's mmap_lock is used to protect mm's memory mapping such as CPU page
+ * tables. Linux core mm hold this lock whenever it need to change process
+ * space's memory mapping, for example, during a user munmap process.
+ *
+ * xe hold mmap_lock when it needs to walk CPU page table, such as when
+ * it calls hmm_range_fault to populate CPU page tables.
+ *
+ * 3. xe_vm's dma-resv
+ * xe_vm's dma reservation object is used protect GPU page table update.
+ * For BO type vma, dma resv is enough for page table update. For userptr
+ * and hmmptr, besides dma resv, we need an extra notifier_lock to avoid
+ * page table update collision with userptr invalidation. See below.
+ *
+ * 4. xe_vm::userptr::notifier_lock
+ * notifier_lock is used to protect userptr/hmmptr GPU page table update,
+ * to avoid a update collision with userptr invalidation. So notifier_lock
+ * is required in the userptr invalidate callback function. Notifier_lock
+ * is the "user_lock" in the documentation of mmu_interval_read_begin().
+ *
+ * Lock order
+ * ==========
+ * Acquiring locks in the same order can avoid deadlocks. The locking
+ * order of above locks are:
+ *
+ * xe_vm::lock => mmap_lock => xe_vm::dma-resv => notifier_lock
+ *
+ *
+ * Use case, pseudo codes
+ * =====================
+ *
+ * Below are pseudo codes of hmmptr's gpu page fault handler:
+ *
+ * get gpu vm from page fault asid
+ * Down_write(vm->lock)
+ * walk vma tree, get vma of fault address
+ *
+ * Again:
+ * Mmap_read_lock
+ * do page migration for vma if needed
+ * vma->userptr.notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier)
+ * call hmm_range_fault to retrieve vma's pfns/pages
+ * Mmap_read_unlock
+ *
+ * xe_vm_lock(vm)
+ * down_read(&vm->userptr.notifier_lock);
+ * if (!mmu_interval_read_retry() {
+ *     up_read(&vm->userptr.notifier_lock);
+ *     goto Again; //collision happened with userptr invalidation, retry
+ * }
+ *
+ * xe_vm_populate_pgtable or submit gpu job to update page table
+ * up_read(&vm->userptr.notifier_lock);
+ *
+ * xe_vm_unlock(vm)
+ * Up_write(vm->lock)
+ *
+ * In above code, we first hold vm->lock so we can walk vm's vma tree to
+ * get a vma of the fault address.
+ *
+ * Then we do page migration if needed. Page migration is not needed for
+ * userptr but might be needed for hmmptr. After migration, we populate
+ * the pfns of the vma. Since this requires walking CPU page table, we
+ * hold a mmap_lock in this step.
+ *
+ * After that, the remaining work is to update GPU page table with the
+ * pfns/pages populated above. Since we use vm's dma-resv object to protect
+ * gpu page table update, we need to hold vm's dma-resv in this step.
+ *
+ * Since we don't hold the mmap_lock during GPU page table update, user
+ * might perform munmap simultaneously which can cause userptr invalidation.
+ * If such collision happens, we will retry.
+ *
+ * notifier_lock is hold in both mmu notifier callback (Not listed above),
+ * and GPU page table update.
+ *
+ */
+#endif
+
+
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 3b4debfecc9b..d1f5949d4a3b 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -271,7 +271,7 @@ struct xe_vm {
 
 	/**
 	 * @lock: outer most lock, protects objects of anything attached to this
-	 * VM
+	 * VM. See more details in xe_lock_doc.h
 	 */
 	struct rw_semaphore lock;
 	/**
-- 
2.26.3