<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Arial;font-size:10pt;color:#0000FF;margin:5pt;" align="Left">
[AMD Official Use Only]<br>
</p>
<br>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Poison mode is a global setting currently, will we set it per IP block in the future?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
For example, set poison mode for GFX but fatal error mode for SDMA?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="color: rgb(0, 0, 0); font-family: calibri, arial, helvetica, sans-serif; font-size: 12pt; background-color: rgb(255, 255, 255); display: inline !important;">dgpu_mode is disabled when </span><span style="color: rgb(32, 31, 30); font-family: "Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif; font-size: 14.6667px; background-color: rgb(255, 255, 255); display: inline !important;"><span style="background-color: rgb(255, 255, 255); font-family: calibri, arial, helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); display: inline !important;">connected_to_cpu
is 1, is irrelevant to IP block.</span></span><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Regards,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Tao</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Zhang, Hawking <Hawking.Zhang@amd.com><br>
<b>Sent:</b> Saturday, September 18, 2021 4:59 PM<br>
<b>To:</b> Zhou1, Tao <Tao.Zhou1@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com><br>
<b>Subject:</b> RE: [PATCH 2/3] drm/amdgpu: set poison mode for RAS</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">[AMD Official Use Only]<br>
<br>
+ if (amdgpu_ras_is_poison_enabled(adev))<br>
ras_cmd->ras_in_message.init_flags.poison_mode_en = 1;<br>
- else<br>
+ if (!adev->gmc.xgmi.connected_to_cpu)<br>
ras_cmd->ras_in_message.init_flags.dgpu_mode = 1;<br>
<br>
I'd expect these flags are set in enable_feature command per IP block if needed. Instead of global setting at firmware/TA initialization phase, thoughts?<br>
<br>
Regards,<br>
Hawking<br>
<br>
-----Original Message-----<br>
From: Zhou1, Tao <Tao.Zhou1@amd.com> <br>
Sent: Saturday, September 18, 2021 16:08<br>
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com><br>
Cc: Zhou1, Tao <Tao.Zhou1@amd.com><br>
Subject: [PATCH 2/3] drm/amdgpu: set poison mode for RAS<br>
<br>
Add RAS poison mode flag and tell PSP RAS TA about the info.<br>
<br>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com><br>
---<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 28 +++++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 5 +++++<br>
3 files changed, 35 insertions(+), 2 deletions(-)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c<br>
index 7d09b28889af..140b94da2f5a 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c<br>
@@ -1442,9 +1442,9 @@ static int psp_ras_initialize(struct psp_context *psp)<br>
ras_cmd = (struct ta_ras_shared_memory *)psp->ras_context.context.mem_context.shared_buf;<br>
memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory));<br>
<br>
- if (psp->adev->gmc.xgmi.connected_to_cpu)<br>
+ if (amdgpu_ras_is_poison_enabled(adev))<br>
ras_cmd->ras_in_message.init_flags.poison_mode_en = 1;<br>
- else<br>
+ if (!adev->gmc.xgmi.connected_to_cpu)<br>
ras_cmd->ras_in_message.init_flags.dgpu_mode = 1;<br>
<br>
ret = psp_ras_load(psp);<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c<br>
index b5332db4d287..7b7e54fdd785 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c<br>
@@ -2180,6 +2180,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev) {<br>
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);<br>
int r;<br>
+ bool df_poison, umc_poison;<br>
<br>
if (con)<br>
return 0;<br>
@@ -2249,6 +2250,23 @@ int amdgpu_ras_init(struct amdgpu_device *adev)<br>
goto release_con;<br>
}<br>
<br>
+ /* Init poison mode, the default value is false */<br>
+ if (adev->df.funcs &&<br>
+ adev->df.funcs->query_ras_poison_mode &&<br>
+ adev->umc.ras_funcs &&<br>
+ adev->umc.ras_funcs->query_ras_poison_mode) {<br>
+ df_poison =<br>
+ adev->df.funcs->query_ras_poison_mode(adev);<br>
+ umc_poison =<br>
+ adev->umc.ras_funcs->query_ras_poison_mode(adev);<br>
+ /* Only poison is set in both DF and UMC, we can enable it */<br>
+ if (df_poison && umc_poison)<br>
+ con->poison_mode_en = true;<br>
+ else if (df_poison != umc_poison)<br>
+ dev_warn(adev->dev, "Poison setting is inconsistent in DF/UMC(%d:%d)!\n",<br>
+ df_poison, umc_poison);<br>
+ }<br>
+<br>
if (amdgpu_ras_fs_init(adev)) {<br>
r = -EINVAL;<br>
goto release_con;<br>
@@ -2292,6 +2310,16 @@ static int amdgpu_persistent_edc_harvesting(struct amdgpu_device *adev,<br>
return 0;<br>
}<br>
<br>
+bool amdgpu_ras_is_poison_enabled(struct amdgpu_device *adev) {<br>
+ struct amdgpu_ras *con = amdgpu_ras_get_context(adev);<br>
+<br>
+ if (!con)<br>
+ return false;<br>
+<br>
+ return con->poison_mode_en;<br>
+}<br>
+<br>
/* helper function to handle common stuff in ip late init phase */ int amdgpu_ras_late_init(struct amdgpu_device *adev,<br>
struct ras_common_if *ras_block,<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h<br>
index 1670467c2054..044bd19b7cce 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h<br>
@@ -345,6 +345,9 @@ struct amdgpu_ras {<br>
/* disable ras error count harvest in recovery */<br>
bool disable_ras_err_cnt_harvest;<br>
<br>
+ /* is poison mode */<br>
+ bool poison_mode_en;<br>
+<br>
/* RAS count errors delayed work */<br>
struct delayed_work ras_counte_delay_work;<br>
atomic_t ras_ue_count;<br>
@@ -640,4 +643,6 @@ void amdgpu_release_ras_context(struct amdgpu_device *adev);<br>
<br>
int amdgpu_persistent_edc_harvesting_supported(struct amdgpu_device *adev);<br>
<br>
+bool amdgpu_ras_is_poison_enabled(struct amdgpu_device *adev);<br>
+<br>
#endif<br>
--<br>
2.17.1<br>
</div>
</span></font></div>
</div>
</body>
</html>