<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2023-09-13 12:14, Felix Kuehling
wrote:<br>
</div>
<blockquote type="cite" cite="mid:afb950e9-47ac-5823-8ed2-4c1e01fb5f0d@amd.com">On
2023-09-13 11:16, Philip Yang wrote:
<br>
<blockquote type="cite">If new range is added to update list,
splited to multiple pranges with
<br>
max_svm_range_pages alignment, and svm validate and map returns
error
<br>
for the first prange, then the caller retry should add pranges
with
<br>
prange->is_error_flag or prange without
prange->mapped_to_gpu to the
<br>
update list, to update GPU mapping for the entire range.
<br>
</blockquote>
<br>
It looks like the only new thing here is to remove the "same
attribute" optimization for ranges that are not mapped on the GPU.
I don't fully understand the scenario you're describing here, but
it feels like this change has a bigger impact than it needs to
have. Your description specifically talks about ranges split at
max_svm_range_pages boundaries. But your patch affects all ranges
not mapped on the GPU, even it prange->is_error_flag is not
set.
<br>
<br>
Maybe that's OK, because the expensive thing is updating existing
mappings unnecessarily. If there is no existing mapping yet, it's
probably not a big deal. I just don't understand the scenario that
requires a retry without the prange->is_error_flag being set.
Maybe a better fix would be to ensure that
prange->is_error_flag gets set in your scenario.
<br>
</blockquote>
<p>Yes, this will not affect existing mapping.</p>
<p>For example set_attr 128MB range, split to 2 pranges at 64MB
boundary, added to update_list, then the first prange svm validate
and map returns error -EAGAIN because NUMA balancing, with
prange->is_error_flag set, the second prange->is_error_flag
is not set. Then Thunk retry set_attr,the first prange added to
update_list, but the second prange is not added to update_list,
with xnack off, the second prange is not mapped to GPUs, and
generate GPU page fault later.<br>
</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<br>
<blockquote type="cite" cite="mid:afb950e9-47ac-5823-8ed2-4c1e01fb5f0d@amd.com">
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite">
<br>
Fixes: c22b04407097 ("drm/amdkfd: flag added to handle errors
from svm validate and map")
<br>
Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
<br>
Tested-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:james.zhu@amd.com"><james.zhu@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--
<br>
1 file changed, 3 insertions(+), 2 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
index 61dd66bddc3c..8871329e9cbd 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
@@ -825,7 +825,7 @@ svm_range_is_same_attrs(struct kfd_process
*p, struct svm_range *prange,
<br>
}
<br>
}
<br>
- return !prange->is_error_flag;
<br>
+ return true;
<br>
}
<br>
/**
<br>
@@ -2228,7 +2228,8 @@ svm_range_add(struct kfd_process *p,
uint64_t start, uint64_t size,
<br>
next = interval_tree_iter_next(node, start, last);
<br>
next_start = min(node->last, last) + 1;
<br>
- if (svm_range_is_same_attrs(p, prange, nattr, attrs))
{
<br>
+ if (!prange->is_error_flag &&
prange->mapped_to_gpu &&
<br>
+ svm_range_is_same_attrs(p, prange, nattr, attrs)) {
<br>
/* nothing to do */
<br>
} else if (node->start < start || node->last
> last) {
<br>
/* node intersects the update range and its
attributes
<br>
</blockquote>
</blockquote>
</body>
</html>