<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2/17/2025 8:05 PM, Alex Deucher
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CADnq5_Pm=epLrLRd842rsVriOo0ttTtB81_PK+BWHXkLwPPJfQ@mail.gmail.com">
<pre wrap="" class="moz-quote-pre">On Mon, Feb 17, 2025 at 9:18 AM SRINIVASAN SHANMUGAM
<a class="moz-txt-link-rfc2396E" href="mailto:srinivasan.shanmugam@amd.com"><srinivasan.shanmugam@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
On 2/17/2025 7:44 PM, Alex Deucher wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">On Sat, Feb 15, 2025 at 3:02 AM SRINIVASAN SHANMUGAM
<a class="moz-txt-link-rfc2396E" href="mailto:srinivasan.shanmugam@amd.com"><srinivasan.shanmugam@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
On 2/14/2025 11:05 PM, Alex Deucher wrote:
Re-send the mes message on resume to make sure the
mes state is up to date.
Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES")
Signed-off-by: Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a>
Cc: Shaoyun Liu <a class="moz-txt-link-rfc2396E" href="mailto:shaoyun.liu@amd.com"><shaoyun.liu@amd.com></a>
Cc: Srinivasan Shanmugam <a class="moz-txt-link-rfc2396E" href="mailto:srinivasan.shanmugam@amd.com"><srinivasan.shanmugam@amd.com></a>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 ++++---------
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 20 +++++++++++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +-
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 4 ++++
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 4 ++++
5 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index b9bd6654f3172..a194bf3347cbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1665,24 +1665,19 @@ static ssize_t amdgpu_gfx_set_enforce_isolation(struct device *dev,
}
mutex_lock(&adev->enforce_isolation_mutex);
-
for (i = 0; i < num_partitions; i++) {
- if (adev->enforce_isolation[i] && !partition_values[i]) {
+ if (adev->enforce_isolation[i] && !partition_values[i])
/* Going from enabled to disabled */
amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
- if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
- amdgpu_mes_set_enforce_isolation(adev, i, false);
- } else if (!adev->enforce_isolation[i] && partition_values[i]) {
+ else if (!adev->enforce_isolation[i] && partition_values[i])
/* Going from disabled to enabled */
amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
- if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
- amdgpu_mes_set_enforce_isolation(adev, i, true);
- }
adev->enforce_isolation[i] = partition_values[i];
}
-
mutex_unlock(&adev->enforce_isolation_mutex);
+ amdgpu_mes_update_enforce_isolation(adev);
+
return count;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index cee38bb6cfaf2..ca076306adba4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1508,7 +1508,8 @@ bool amdgpu_mes_suspend_resume_all_supported(struct amdgpu_device *adev)
}
/* Fix me -- node_id is used to identify the correct MES instances in the future */
-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t node_id, bool enable)
+static int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev,
+ uint32_t node_id, bool enable)
{
struct mes_misc_op_input op_input = {0};
int r;
@@ -1530,6 +1531,23 @@ int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t node_i
return r;
}
+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev)
+{
+ int i, r = 0;
+
+ if (adev->enable_mes && adev->gfx.enable_cleaner_shader) {
+ mutex_lock(&adev->enforce_isolation_mutex);
+ for (i = 0; i < (adev->xcp_mgr ? adev->xcp_mgr->num_xcps : 1); i++) {
+ if (adev->enforce_isolation[i])
+ r |= amdgpu_mes_set_enforce_isolation(adev, i, true);
+ else
+ r |= amdgpu_mes_set_enforce_isolation(adev, i, false);
+ }
+ mutex_unlock(&adev->enforce_isolation_mutex);
+ }
+ return r;
+}
+
#if defined(CONFIG_DEBUG_FS)
static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 6a792ffc81e33..3a65c3788956d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -532,6 +532,6 @@ static inline void amdgpu_mes_unlock(struct amdgpu_mes *mes)
bool amdgpu_mes_suspend_resume_all_supported(struct amdgpu_device *adev);
-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t node_id, bool enable);
+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev);
#endif /* __AMDGPU_MES_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 530371e6a7aee..fc7b17463cb4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1660,6 +1660,10 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block *ip_block)
goto failure;
}
+ r = amdgpu_mes_update_enforce_isolation(adev);
+ if (r)
+ goto failure;
+
Hi Alex,
Should this also be moved to mes_v11_0_hw_init. Please let me know your thoughts?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">I'm not sure I follow. This is in hw_init.
Alex
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
Sorry, my mistake mes_v11_0_sw_init pls?
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">
There's no need to call it in sw_init, plus the hw is not set up in
sw_init so you can't call it there anyway. The whole point of this is
to update the firmware with the current sw state after a suspend or
reset.
</pre>
</blockquote>
<p>Based on this understanding, at resume time - it is just the
message that is needed to be sent only to MES, to make sure we
re-enable enforce_isolation workaround after suspend.</p>
<p><span style="color: rgb(51, 51, 51); font-family: "Source Code Pro", "Courier New", monospace; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: pre; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Acked-by: Srinivasan Shanmugam <a class="moz-txt-link-rfc2396E" href="mailto:srinivasan.shanmugam@amd.com"><srinivasan.shanmugam@amd.com></a></span></p>
<blockquote type="cite" cite="mid:CADnq5_Pm=epLrLRd842rsVriOo0ttTtB81_PK+BWHXkLwPPJfQ@mail.gmail.com">
<pre wrap="" class="moz-quote-pre">
Alex
</pre>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">
Thanks!
Srini
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre"> out:
/*
* Disable KIQ ring usage from the driver once MES is enabled.
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 6db88584dd529..ec91c78468f30 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -1773,6 +1773,10 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block *ip_block)
goto failure;
}
+ r = amdgpu_mes_update_enforce_isolation(adev);
+ if (r)
+ goto failure;
+
And Similarly here also?
Thanks!
Srini
out:
/*
* Disable KIQ ring usage from the driver once MES is enabled.
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>