<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2022-07-25 10:34, Felix Kuehling
wrote:<br>
</div>
<blockquote type="cite" cite="mid:044e7f8e-9fa5-d78b-39f8-84bce155d4c6@amd.com">Am
2022-07-25 um 08:23 schrieb Philip Yang:
<br>
<blockquote type="cite">This will be used to split giant svm range
into smaller ranges, to
<br>
support VRAM overcommitment by giant range and improve GPU retry
fault
<br>
recover on giant range.
<br>
<br>
Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 ++
<br>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 15 +++++++++++++++
<br>
drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 +++
<br>
3 files changed, 20 insertions(+)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
<br>
index 9667015a6cbc..b1f87aa6138b 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
<br>
@@ -1019,6 +1019,8 @@ int svm_migrate_init(struct amdgpu_device
*adev)
<br>
amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size));
<br>
+ svm_range_set_max_pages(adev);
<br>
+
<br>
pr_info("HMM registered %ldMB device memory\n", size
>> 20);
<br>
return 0;
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
index b592aee6d9d6..cf9565ddddf8 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
@@ -46,6 +46,11 @@
<br>
*/
<br>
#define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING (2UL *
NSEC_PER_MSEC)
<br>
+/* Giant svm range split into smaller ranges based on this,
it is decided using
<br>
+ * minimum of all dGPU/APU 1/32 VRAM size, between 2MB to 1GB
and align to 2MB.
<br>
+ */
<br>
+uint64_t max_svm_range_pages;
<br>
+
<br>
struct criu_svm_metadata {
<br>
struct list_head list;
<br>
struct kfd_criu_svm_range_priv_data data;
<br>
@@ -1869,6 +1874,16 @@ static struct svm_range
*svm_range_clone(struct svm_range *old)
<br>
return new;
<br>
}
<br>
+__init void svm_range_set_max_pages(struct amdgpu_device *adev)
<br>
</blockquote>
<br>
Why is this marked as __init? This can run much later than module
init.
<br>
<br>
</blockquote>
<p>Thanks, this was called in amdgpu_amdkfd_init from module init,
forget to remove __init after calling from svm_migrate_init, which
is called from pci_probe.</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:044e7f8e-9fa5-d78b-39f8-84bce155d4c6@amd.com">
<br>
<blockquote type="cite">+{
<br>
+ uint64_t pages;
<br>
+
<br>
+ /* 1/32 VRAM size in pages */
<br>
+ pages = adev->gmc.real_vram_size >> 17;
<br>
+ pages = clamp(pages, 1ULL << 9, 1ULL << 18);
<br>
+ max_svm_range_pages = min_not_zero(max_svm_range_pages,
pages);
<br>
+ max_svm_range_pages = ALIGN(max_svm_range_pages, 1ULL
<< 9);
<br>
</blockquote>
<br>
I'd recommend updating max_svm_range_pages with a single
WRITE_ONCE to avoid race conditions with GPU hot-plug.
<br>
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite">+}
<br>
/**
<br>
* svm_range_add - add svm range and handle overlap
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
<br>
index eab7f6d3b13c..346a41bf8dbf 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
<br>
@@ -204,6 +204,9 @@ void
svm_range_list_lock_and_flush_work(struct svm_range_list *svms,
struct mm_s
<br>
#define KFD_IS_SVM_API_SUPPORTED(dev) ((dev)->pgmap.type !=
0)
<br>
void svm_range_bo_unref_async(struct svm_range_bo *svm_bo);
<br>
+
<br>
+__init void svm_range_set_max_pages(struct amdgpu_device
*adev);
<br>
+
<br>
#else
<br>
struct kfd_process;
<br>
</blockquote>
</blockquote>
</body>
</html>