<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-12-22 7:37 p.m., Rajneesh
Bhardwaj wrote:<br>
</div>
<blockquote type="cite" cite="mid:20211223003711.13064-19-rajneesh.bhardwaj@amd.com">
<pre class="moz-quote-pre" wrap="">Recoverable page faults are represented by the xnack mode setting inside
a kfd process and are used to represent the device page faults. For CR,
we don't consider negative values which are typically used for querying
the current xnack mode without modifying it.
Signed-off-by: Rajneesh Bhardwaj <a class="moz-txt-link-rfc2396E" href="mailto:rajneesh.bhardwaj@amd.com"><rajneesh.bhardwaj@amd.com></a>
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 15 +++++++++++++++
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 178b0ccfb286..446eb9310915 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1845,6 +1845,11 @@ static int criu_checkpoint_process(struct kfd_process *p,
memset(&process_priv, 0, sizeof(process_priv));
process_priv.version = KFD_CRIU_PRIV_VERSION;
+ /* For CR, we don't consider negative xnack mode which is used for
+ * querying without changing it, here 0 simply means disabled and 1
+ * means enabled so retry for finding a valid PTE.
+ */</pre>
</blockquote>
Negative value to query xnack mode is for kfd_ioctl_set_xnack_mode
user space ioctl interface, which is not used by CRIU, I think this
comment is misleading,<br>
<blockquote type="cite" cite="mid:20211223003711.13064-19-rajneesh.bhardwaj@amd.com">
<pre class="moz-quote-pre" wrap="">
+ process_priv.xnack_mode = p->xnack_enabled ? 1 : 0;</pre>
</blockquote>
change to process_priv.xnack_enabled
<blockquote type="cite" cite="mid:20211223003711.13064-19-rajneesh.bhardwaj@amd.com">
<pre class="moz-quote-pre" wrap="">
ret = copy_to_user(user_priv_data + *priv_offset,
&process_priv, sizeof(process_priv));
@@ -2231,6 +2236,16 @@ static int criu_restore_process(struct kfd_process *p,
return -EINVAL;
}
+ pr_debug("Setting XNACK mode\n");
+ if (process_priv.xnack_mode && !kfd_process_xnack_mode(p, true)) {
+ pr_err("xnack mode cannot be set\n");
+ ret = -EPERM;
+ goto exit;
+ } else {</pre>
</blockquote>
<p>On GFXv9 GPUs except Aldebaran, this means the process
checkpointed is xnack off, it can restore and resume on GPU with
xnack on, then shader will continue running successfully, but
driver is not guaranteed to map svm ranges on GPU all the time, if
retry fault happens, the shader will not recover. Maybe change to:<br>
</p>
<p>If (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 2) {</p>
<p> if (process_priv.xnack_enabled != kfd_process_xnack_mode(p,
true)) {<br>
</p>
<p> pr_err("xnack mode cannot be set\n");<br>
</p>
<p> ret = -EPERM;<br>
</p>
<p> goto exit;</p>
<p> }<br>
</p>
<p>}<br>
</p>
<p>pr_debug("set xnack mode: %d\n", process_priv.xnack_enabled);<br>
</p>
<p>p->xnack_enabled = process_priv.xnack_enabled;</p>
<br>
<blockquote type="cite" cite="mid:20211223003711.13064-19-rajneesh.bhardwaj@amd.com">
<pre class="moz-quote-pre" wrap="">+ pr_debug("set xnack mode: %d\n", process_priv.xnack_mode);
+ p->xnack_enabled = process_priv.xnack_mode;
+ }
+
exit:
return ret;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 855c162b85ea..d72dda84c18c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1057,6 +1057,7 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd,
struct kfd_criu_process_priv_data {
uint32_t version;
+ uint32_t xnack_mode;</pre>
</blockquote>
<p>bool xnack_enabled;</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:20211223003711.13064-19-rajneesh.bhardwaj@amd.com">
<pre class="moz-quote-pre" wrap="">
};
struct kfd_criu_device_priv_data {
</pre>
</blockquote>
</body>
</html>