<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com></div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of amd-gfx-request@lists.freedesktop.org <amd-gfx-request@lists.freedesktop.org><br>
<b>Sent:</b> Wednesday, July 17, 2024 4:40 PM<br>
<b>To:</b> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org><br>
<b>Subject:</b> amd-gfx Digest, Vol 98, Issue 217</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Send amd-gfx mailing list submissions to<br>
        amd-gfx@lists.freedesktop.org<br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
        <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
or, via email, send a message with subject or body 'help' to<br>
        amd-gfx-request@lists.freedesktop.org<br>
<br>
You can reach the person managing the list at<br>
        amd-gfx-owner@lists.freedesktop.org<br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of amd-gfx digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
   1. [PATCH 1/6] drm/amdgpu/gfx: add bad opcode interrupt<br>
      (Alex Deucher)<br>
   2. [PATCH 5/6] drm/amdgpu/gfx9: Enable bad opcode interrupt<br>
      (Alex Deucher)<br>
   3. [PATCH 3/6] drm/amdgpu/gfx10: Enable bad opcode interrupt<br>
      (Alex Deucher)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 17 Jul 2024 16:40:06 -0400<br>
From: Alex Deucher <alexander.deucher@amd.com><br>
To: <amd-gfx@lists.freedesktop.org><br>
Cc: Alex Deucher <alexander.deucher@amd.com><br>
Subject: [PATCH 1/6] drm/amdgpu/gfx: add bad opcode interrupt<br>
Message-ID: <20240717204011.15342-1-alexander.deucher@amd.com><br>
Content-Type: text/plain<br>
<br>
Add the irq source for bad opcodes.<br>
<br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com><br>
---<br>
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 +<br>
 1 file changed, 1 insertion(+)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h<br>
index ddda94e49db4..86d3fa7eef90 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h<br>
@@ -391,6 +391,7 @@ struct amdgpu_gfx {<br>
         struct amdgpu_irq_src           eop_irq;<br>
         struct amdgpu_irq_src           priv_reg_irq;<br>
         struct amdgpu_irq_src           priv_inst_irq;<br>
+       struct amdgpu_irq_src           bad_op_irq;<br>
         struct amdgpu_irq_src           cp_ecc_error_irq;<br>
         struct amdgpu_irq_src           sq_irq;<br>
         struct amdgpu_irq_src           rlc_gc_fed_irq;<br>
-- <br>
2.45.2<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Wed, 17 Jul 2024 16:40:10 -0400<br>
From: Alex Deucher <alexander.deucher@amd.com><br>
To: <amd-gfx@lists.freedesktop.org><br>
Cc: Alex Deucher <alexander.deucher@amd.com><br>
Subject: [PATCH 5/6] drm/amdgpu/gfx9: Enable bad opcode interrupt<br>
Message-ID: <20240717204011.15342-5-alexander.deucher@amd.com><br>
Content-Type: text/plain<br>
<br>
For the bad opcode case, it will cause CP/ME hang.<br>
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.<br>
And the driver needs to perform a vmid reset when receiving the interrupt.<br>
<br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com><br>
---<br>
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 65 +++++++++++++++++++++++++++<br>
 1 file changed, 65 insertions(+)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c<br>
index 97476fb2ca40..675a1a8e2515 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c<br>
@@ -2182,6 +2182,13 @@ static int gfx_v9_0_sw_init(void *handle)<br>
         if (r)<br>
                 return r;<br>
 <br>
+       /* Bad opcode Event */<br>
+       r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP,<br>
+                             GFX_9_0__SRCID__CP_BAD_OPCODE_ERROR,<br>
+                             &adev->gfx.bad_op_irq);<br>
+       if (r)<br>
+               return r;<br>
+<br>
         /* Privileged reg */<br>
         r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP, GFX_9_0__SRCID__CP_PRIV_REG_FAULT,<br>
                               &adev->gfx.priv_reg_irq);<br>
@@ -3937,6 +3944,7 @@ static int gfx_v9_0_hw_fini(void *handle)<br>
                 amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);<br>
         amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);<br>
         amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);<br>
+       amdgpu_irq_put(adev, &adev->gfx.bad_op_irq, 0);<br>
 <br>
         /* DF freeze and kcq disable will fail */<br>
         if (!amdgpu_ras_intr_triggered())<br>
@@ -4747,6 +4755,10 @@ static int gfx_v9_0_late_init(void *handle)<br>
         if (r)<br>
                 return r;<br>
 <br>
+       r = amdgpu_irq_get(adev, &adev->gfx.bad_op_irq, 0);<br>
+       if (r)<br>
+               return r;<br>
+<br>
         r = gfx_v9_0_ecc_late_init(handle);<br>
         if (r)<br>
                 return r;<br>
@@ -5990,6 +6002,42 @@ static int gfx_v9_0_set_priv_reg_fault_state(struct amdgpu_device *adev,<br>
         return 0;<br>
 }<br>
 <br>
+static int gfx_v9_0_set_bad_op_fault_state(struct amdgpu_device *adev,<br>
+                                          struct amdgpu_irq_src *source,<br>
+                                          unsigned type,<br>
+                                          enum amdgpu_interrupt_state state)<br>
+{<br>
+       u32 cp_int_cntl_reg, cp_int_cntl;<br>
+       int i, j;<br>
+<br>
+       switch (state) {<br>
+       case AMDGPU_IRQ_STATE_DISABLE:<br>
+       case AMDGPU_IRQ_STATE_ENABLE:<br>
+               WREG32_FIELD15(GC, 0, CP_INT_CNTL_RING0,<br>
+                              OPCODE_ERROR_INT_ENABLE,<br>
+                              state == AMDGPU_IRQ_STATE_ENABLE ? 1 : 0);<br>
+               for (i = 0; i < adev->gfx.mec.num_mec; i++) {<br>
+                       for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {<br>
+                               /* MECs start at 1 */<br>
+                               cp_int_cntl_reg = gfx_v9_0_get_cpc_int_cntl(adev, i + 1, j);<br>
+<br>
+                               if (cp_int_cntl_reg) {<br>
+                                       cp_int_cntl = RREG32_SOC15_IP(GC, cp_int_cntl_reg);<br>
+                                       cp_int_cntl = REG_SET_FIELD(cp_int_cntl, CP_ME1_PIPE0_INT_CNTL,<br>
+                                                                   OPCODE_ERROR_INT_ENABLE,<br>
+                                                                   state == AMDGPU_IRQ_STATE_ENABLE ? 1 : 0);<br>
+                                       WREG32_SOC15_IP(GC, cp_int_cntl_reg, cp_int_cntl);<br>
+                               }<br>
+                       }<br>
+               }<br>
+               break;<br>
+       default:<br>
+               break;<br>
+       }<br>
+<br>
+       return 0;<br>
+}<br>
+<br>
 static int gfx_v9_0_set_priv_inst_fault_state(struct amdgpu_device *adev,<br>
                                               struct amdgpu_irq_src *source,<br>
                                               unsigned type,<br>
@@ -6163,6 +6211,15 @@ static int gfx_v9_0_priv_reg_irq(struct amdgpu_device *adev,<br>
         return 0;<br>
 }<br>
 <br>
+static int gfx_v9_0_bad_op_irq(struct amdgpu_device *adev,<br>
+                              struct amdgpu_irq_src *source,<br>
+                              struct amdgpu_iv_entry *entry)<br>
+{<br>
+       DRM_ERROR("Illegal opcode in command stream\n");<br>
+       gfx_v9_0_fault(adev, entry);<br>
+       return 0;<br>
+}<br>
+<br>
 static int gfx_v9_0_priv_inst_irq(struct amdgpu_device *adev,<br>
                                   struct amdgpu_irq_src *source,<br>
                                   struct amdgpu_iv_entry *entry)<br>
@@ -7346,6 +7403,11 @@ static const struct amdgpu_irq_src_funcs gfx_v9_0_priv_reg_irq_funcs = {<br>
         .process = gfx_v9_0_priv_reg_irq,<br>
 };<br>
 <br>
+static const struct amdgpu_irq_src_funcs gfx_v9_0_bad_op_irq_funcs = {<br>
+       .set = gfx_v9_0_set_bad_op_fault_state,<br>
+       .process = gfx_v9_0_bad_op_irq,<br>
+};<br>
+<br>
 static const struct amdgpu_irq_src_funcs gfx_v9_0_priv_inst_irq_funcs = {<br>
         .set = gfx_v9_0_set_priv_inst_fault_state,<br>
         .process = gfx_v9_0_priv_inst_irq,<br>
@@ -7365,6 +7427,9 @@ static void gfx_v9_0_set_irq_funcs(struct amdgpu_device *adev)<br>
         adev->gfx.priv_reg_irq.num_types = 1;<br>
         adev->gfx.priv_reg_irq.funcs = &gfx_v9_0_priv_reg_irq_funcs;<br>
 <br>
+       adev->gfx.bad_op_irq.num_types = 1;<br>
+       adev->gfx.bad_op_irq.funcs = &gfx_v9_0_bad_op_irq_funcs;<br>
+<br>
         adev->gfx.priv_inst_irq.num_types = 1;<br>
         adev->gfx.priv_inst_irq.funcs = &gfx_v9_0_priv_inst_irq_funcs;<br>
 <br>
-- <br>
2.45.2<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Wed, 17 Jul 2024 16:40:08 -0400<br>
From: Alex Deucher <alexander.deucher@amd.com><br>
To: <amd-gfx@lists.freedesktop.org><br>
Cc: Jesse Zhang <jesse.zhang@amd.com>, Alex Deucher<br>
        <alexander.deucher@amd.com><br>
Subject: [PATCH 3/6] drm/amdgpu/gfx10: Enable bad opcode interrupt<br>
Message-ID: <20240717204011.15342-3-alexander.deucher@amd.com><br>
Content-Type: text/plain<br>
<br>
From: Jesse Zhang <jesse.zhang@amd.com><br>
<br>
For the bad opcode case, it will cause CP/ME hang.<br>
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.<br>
And the driver needs to perform a vmid reset when receiving the interrupt.<br>
<br>
v2: update irq naming (drop priv) (Alex)<br>
<br>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com><br>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com><br>
---<br>
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 74 ++++++++++++++++++++++++++<br>
 1 file changed, 74 insertions(+)<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
index 66d80f3dc661..4ce13a4f7a20 100644<br>
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c<br>
@@ -4740,6 +4740,13 @@ static int gfx_v10_0_sw_init(void *handle)<br>
         if (r)<br>
                 return r;<br>
 <br>
+       /* Bad opcode Event */<br>
+       r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP,<br>
+                             GFX_10_1__SRCID__CP_BAD_OPCODE_ERROR,<br>
+                             &adev->gfx.bad_op_irq);<br>
+       if (r)<br>
+               return r;<br>
+<br>
         /* Privileged reg */<br>
         r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_GRBM_CP, GFX_10_1__SRCID__CP_PRIV_REG_FAULT,<br>
                               &adev->gfx.priv_reg_irq);<br>
@@ -7416,6 +7423,7 @@ static int gfx_v10_0_hw_fini(void *handle)<br>
 <br>
         amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);<br>
         amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);<br>
+       amdgpu_irq_put(adev, &adev->gfx.bad_op_irq, 0);<br>
 <br>
         /* WA added for Vangogh asic fixing the SMU suspend failure<br>
          * It needs to set power gating again during gfxoff control<br>
@@ -7726,6 +7734,10 @@ static int gfx_v10_0_late_init(void *handle)<br>
         if (r)<br>
                 return r;<br>
 <br>
+       r = amdgpu_irq_get(adev, &adev->gfx.bad_op_irq, 0);<br>
+       if (r)<br>
+               return r;<br>
+<br>
         return 0;<br>
 }<br>
 <br>
@@ -9162,6 +9174,51 @@ static int gfx_v10_0_set_priv_reg_fault_state(struct amdgpu_device *adev,<br>
         return 0;<br>
 }<br>
 <br>
+static int gfx_v10_0_set_bad_op_fault_state(struct amdgpu_device *adev,<br>
+                                           struct amdgpu_irq_src *source,<br>
+                                           unsigned type,<br>
+                                           enum amdgpu_interrupt_state state)<br>
+{<br>
+       u32 cp_int_cntl_reg, cp_int_cntl;<br>
+       int i , j;<br>
+<br>
+       switch (state) {<br>
+       case AMDGPU_IRQ_STATE_DISABLE:<br>
+       case AMDGPU_IRQ_STATE_ENABLE:<br>
+               for (i = 0; i < adev->gfx.me.num_me; i++) {<br>
+                       for (j = 0; j < adev->gfx.me.num_pipe_per_me; j++) {<br>
+                               cp_int_cntl_reg = gfx_v10_0_get_cpg_int_cntl(adev, i, j);<br>
+<br>
+                               if (cp_int_cntl_reg) {<br>
+                                       cp_int_cntl = RREG32_SOC15_IP(GC, cp_int_cntl_reg);<br>
+                                       cp_int_cntl = REG_SET_FIELD(cp_int_cntl, CP_INT_CNTL_RING0,<br>
+                                                                   OPCODE_ERROR_INT_ENABLE,<br>
+                                                                   state == AMDGPU_IRQ_STATE_ENABLE ? 1 : 0);<br>
+                                       WREG32_SOC15_IP(GC, cp_int_cntl_reg, cp_int_cntl);<br>
+                               }<br>
+                       }<br>
+               }<br>
+               for (i = 0; i < adev->gfx.mec.num_mec; i++) {<br>
+                       for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++) {<br>
+                               /* MECs start at 1 */<br>
+                               cp_int_cntl_reg = gfx_v10_0_get_cpc_int_cntl(adev, i + 1, j);<br>
+<br>
+                               if (cp_int_cntl_reg) {<br>
+                                       cp_int_cntl = RREG32_SOC15_IP(GC, cp_int_cntl_reg);<br>
+                                       cp_int_cntl = REG_SET_FIELD(cp_int_cntl, CP_ME1_PIPE0_INT_CNTL,<br>
+                                                                   OPCODE_ERROR_INT_ENABLE,<br>
+                                                                   state == AMDGPU_IRQ_STATE_ENABLE ? 1 : 0);<br>
+                                       WREG32_SOC15_IP(GC, cp_int_cntl_reg, cp_int_cntl);<br>
+                               }<br>
+                       }<br>
+               }<br>
+               break;<br>
+       default:<br>
+               break;<br>
+       }<br>
+       return 0;<br>
+}<br>
+<br>
 static int gfx_v10_0_set_priv_inst_fault_state(struct amdgpu_device *adev,<br>
                                                struct amdgpu_irq_src *source,<br>
                                                unsigned int type,<br>
@@ -9237,6 +9294,15 @@ static int gfx_v10_0_priv_reg_irq(struct amdgpu_device *adev,<br>
         return 0;<br>
 }<br>
 <br>
+static int gfx_v10_0_bad_op_irq(struct amdgpu_device *adev,<br>
+                               struct amdgpu_irq_src *source,<br>
+                               struct amdgpu_iv_entry *entry)<br>
+{<br>
+       DRM_ERROR("Illegal opcode in command stream \n");<br>
+       gfx_v10_0_handle_priv_fault(adev, entry);<br>
+       return 0;<br>
+}<br>
+<br>
 static int gfx_v10_0_priv_inst_irq(struct amdgpu_device *adev,<br>
                                    struct amdgpu_irq_src *source,<br>
                                    struct amdgpu_iv_entry *entry)<br>
@@ -9624,6 +9690,11 @@ static const struct amdgpu_irq_src_funcs gfx_v10_0_priv_reg_irq_funcs = {<br>
         .process = gfx_v10_0_priv_reg_irq,<br>
 };<br>
 <br>
+static const struct amdgpu_irq_src_funcs gfx_v10_0_bad_op_irq_funcs = {<br>
+       .set = gfx_v10_0_set_bad_op_fault_state,<br>
+       .process = gfx_v10_0_bad_op_irq,<br>
+};<br>
+<br>
 static const struct amdgpu_irq_src_funcs gfx_v10_0_priv_inst_irq_funcs = {<br>
         .set = gfx_v10_0_set_priv_inst_fault_state,<br>
         .process = gfx_v10_0_priv_inst_irq,<br>
@@ -9645,6 +9716,9 @@ static void gfx_v10_0_set_irq_funcs(struct amdgpu_device *adev)<br>
         adev->gfx.priv_reg_irq.num_types = 1;<br>
         adev->gfx.priv_reg_irq.funcs = &gfx_v10_0_priv_reg_irq_funcs;<br>
 <br>
+       adev->gfx.bad_op_irq.num_types = 1;<br>
+       adev->gfx.bad_op_irq.funcs = &gfx_v10_0_bad_op_irq_funcs;<br>
+<br>
         adev->gfx.priv_inst_irq.num_types = 1;<br>
         adev->gfx.priv_inst_irq.funcs = &gfx_v10_0_priv_inst_irq_funcs;<br>
 }<br>
-- <br>
2.45.2<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
amd-gfx mailing list<br>
amd-gfx@lists.freedesktop.org<br>
<a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
<br>
<br>
------------------------------<br>
<br>
End of amd-gfx Digest, Vol 98, Issue 217<br>
****************************************<br>
</div>
</span></font></div>
</div>
</body>
</html>