<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="Generator" content="Microsoft Word 15 (filtered medium)"> <style></style> </head> <body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word"> [AMD Official Use Only - AMD Internal Distribution Only] <div> <div class="WordSection1"> <pre><o:p> </o:p></pre> <pre>Hi Sunil,<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Move your two comments here since the email format has changed.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>>+ amdgpu_job_core_dump(adev, job);<o:p></o:p></pre> <pre>>Let's call this after adev->job_hang = true;<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Sure, I will modify it in new patch<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Thanks,<o:p></o:p></pre> <pre>Trigger<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>>regards<o:p></o:p></pre> <pre>>Sunil<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>>I am totally in for dumping the ip registers as early as possible and its a good idea to dump before soft reset.<o:p></o:p></pre> <pre>>I am little worried if we need to stop scheduling any further jobs and process the ones already scheduled. If that's not a problem it looks good to me.<o:p></o:p></pre> <pre>As mentioned by Alex, it shouldn't affect the hw state.<o:p></o:p></pre> <pre>Since the default value of amdgpu_sched_hw_submission is set to 2, there can be 2 jobs in the ring buffer when GPU hang, so when dumping the ring buffer, we may get some PM4 packets that do not belong to this hang job, but CP should not issue new job to graphics pipeline if the current job is not finished, so the HW state is still there for the core dump. <o:p></o:p></pre> <pre>Probably we can set amdgpu_sched_hw_submission=1 for a more precise dump when debugging some specific hang issue.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Thanks,<o:p></o:p></pre> <pre>Trigger<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>>With one minor change above patch looks fine to me. Just make your try it and see the time it takes between "Dumping IP State" and "Dumping IP State Completed" i.e reading registers.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>I tested several times on Renoir APU platform, it will cost less than 1 ms to dump the status of gfx_v9_0 + sdma_v4_0+vcn_v2_0.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>[ 189.186995] amdgpu 0000:02:00.0: amdgpu: Dumping all IP State took 946695 nanoseconds [ 227.226338] amdgpu 0000:02:00.0: amdgpu: Dumping all IP State took 927643 nanoseconds [ 274.512442] amdgpu 0000:02:00.0: amdgpu: Dumping all IP State took 957721 nanoseconds [ 291.186766] amdgpu 0000:02:00.0: amdgpu: Dumping all IP State took 999311 nanoseconds [ 313.559718] amdgpu 0000:02:00.0: amdgpu: Dumping all IP State took 922235 nanoseconds<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Thanks,<o:p></o:p></pre> <pre>Trigger<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>>Acked-by: Sunil Khatri <a href="mailto:sunil.khatri@amd.com">sunil.khatri@amd.com</a><o:p></o:p></pre> <pre>>Regards<o:p></o:p></pre> <pre>>Sunil K<o:p></o:p></pre> <pre><o:p> </o:p></pre> <o:p> </o:p> <div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt"> <div> <div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in"> From: Khatri, Sunil <Sunil.Khatri@amd.com> Sent: Wednesday, August 21, 2024 12:54 AM To: Alex Deucher <alexdeucher@gmail.com> Cc: Huang, Trigger <Trigger.Huang@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com> Subject: Re: [PATCH 2/2] drm/amdgpu: Do core dump immediately when job tmo<o:p></o:p> </div> </div> <o:p> </o:p> <o:p> </o:p> <div> On 8/20/2024 9:31 PM, Alex Deucher wrote:<o:p></o:p> </div> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>On Tue, Aug 20, 2024 at 11:31 AM Khatri, Sunil <a href="mailto:sunil.khatri@amd.com"><sunil.khatri@amd.com></a> wrote:<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>On 8/20/2024 1:00 PM, Huang, Trigger wrote:<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>[AMD Official Use Only - AMD Internal Distribution Only]<o:p></o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>-----Original Message-----<o:p></o:p></pre> <pre>From: Khatri, Sunil <a href="mailto:Sunil.Khatri@amd.com"><Sunil.Khatri@amd.com></a><o:p></o:p></pre> <pre>Sent: Monday, August 19, 2024 6:31 PM<o:p></o:p></pre> <pre>To: Huang, Trigger <a href="mailto:Trigger.Huang@amd.com"><Trigger.Huang@amd.com></a>; <a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><o:p></o:p></pre> <pre>Cc: Deucher, Alexander <a href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><o:p></o:p></pre> <pre>Subject: Re: [PATCH 2/2] drm/amdgpu: Do core dump immediately when job<o:p></o:p></pre> <pre>tmo<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <pre>On 8/19/2024 3:23 PM, <a href="mailto:Trigger.Huang@amd.com">Trigger.Huang@amd.com</a> wrote:<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>From: Trigger Huang <a href="mailto:Trigger.Huang@amd.com"><Trigger.Huang@amd.com></a><o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Do the coredump immediately after a job timeout to get a closer<o:p></o:p></pre> <pre>representation of GPU's error status.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>V2: This will skip printing vram_lost as the GPU reset is not happened<o:p></o:p></pre> <pre>yet (Alex)<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>V3: Unconditionally call the core dump as we care about all the reset<o:p></o:p></pre> <pre>functions(soft-recovery and queue reset and full adapter reset, Alex)<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Signed-off-by: Trigger Huang <a href="mailto:Trigger.Huang@amd.com"><Trigger.Huang@amd.com></a><o:p></o:p></pre> <pre>---<o:p></o:p></pre> <pre> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 62<o:p></o:p></pre> </blockquote> <pre>+++++++++++++++++++++++++<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> 1 file changed, 62 insertions(+)<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<o:p></o:p></pre> <pre>b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<o:p></o:p></pre> <pre>index c6a1783fc9ef..ebbb1434073e 100644<o:p></o:p></pre> <pre>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<o:p></o:p></pre> <pre>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c<o:p></o:p></pre> <pre>@@ -30,6 +30,61 @@<o:p></o:p></pre> <pre> #include "amdgpu.h"<o:p></o:p></pre> <pre> #include "amdgpu_trace.h"<o:p></o:p></pre> <pre> #include "amdgpu_reset.h"<o:p></o:p></pre> <pre>+#include "amdgpu_dev_coredump.h"<o:p></o:p></pre> <pre>+#include "amdgpu_xgmi.h"<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+static void amdgpu_job_do_core_dump(struct amdgpu_device *adev,<o:p></o:p></pre> <pre>+ struct amdgpu_job *job)<o:p></o:p></pre> <pre>+{<o:p></o:p></pre> <pre>+ int i;<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+ dev_info(adev->dev, "Dumping IP State\n");<o:p></o:p></pre> <pre>+ for (i = 0; i < adev->num_ip_blocks; i++) {<o:p></o:p></pre> <pre>+ if (adev->ip_blocks[i].version->funcs->dump_ip_state)<o:p></o:p></pre> <pre>+ adev->ip_blocks[i].version->funcs<o:p></o:p></pre> <pre>+ ->dump_ip_state((void *)adev);<o:p></o:p></pre> <pre>+ dev_info(adev->dev, "Dumping IP State Completed\n");<o:p></o:p></pre> <pre>+ }<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+ amdgpu_coredump(adev, true, false, job); }<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+static void amdgpu_job_core_dump(struct amdgpu_device *adev,<o:p></o:p></pre> <pre>+ struct amdgpu_job *job)<o:p></o:p></pre> <pre>+{<o:p></o:p></pre> <pre>+ struct list_head device_list, *device_list_handle = NULL;<o:p></o:p></pre> <pre>+ struct amdgpu_device *tmp_adev = NULL;<o:p></o:p></pre> <pre>+ struct amdgpu_hive_info *hive = NULL;<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+ if (!amdgpu_sriov_vf(adev))<o:p></o:p></pre> <pre>+ hive = amdgpu_get_xgmi_hive(adev);<o:p></o:p></pre> <pre>+ if (hive)<o:p></o:p></pre> <pre>+ mutex_lock(&hive->hive_lock);<o:p></o:p></pre> <pre>+ /*<o:p></o:p></pre> <pre>+ * Reuse the logic in amdgpu_device_gpu_recover() to build list of<o:p></o:p></pre> <pre>+ * devices for code dump<o:p></o:p></pre> <pre>+ */<o:p></o:p></pre> <pre>+ INIT_LIST_HEAD(&device_list);<o:p></o:p></pre> <pre>+ if (!amdgpu_sriov_vf(adev) && (adev-<o:p></o:p></pre> <pre>gmc.xgmi.num_physical_nodes > 1) && hive) {<o:p></o:p></pre> <pre>+ list_for_each_entry(tmp_adev, &hive->device_list,<o:p></o:p></pre> </blockquote> <pre>gmc.xgmi.head)<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>+ list_add_tail(&tmp_adev->reset_list, &device_list);<o:p></o:p></pre> <pre>+ if (!list_is_first(&adev->reset_list, &device_list))<o:p></o:p></pre> <pre>+ list_rotate_to_front(&adev->reset_list, &device_list);<o:p></o:p></pre> <pre>+ device_list_handle = &device_list;<o:p></o:p></pre> <pre>+ } else {<o:p></o:p></pre> <pre>+ list_add_tail(&adev->reset_list, &device_list);<o:p></o:p></pre> <pre>+ device_list_handle = &device_list;<o:p></o:p></pre> <pre>+ }<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+ /* Do the coredump for each device */<o:p></o:p></pre> <pre>+ list_for_each_entry(tmp_adev, device_list_handle, reset_list)<o:p></o:p></pre> <pre>+ amdgpu_job_do_core_dump(tmp_adev, job);<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre>+ if (hive) {<o:p></o:p></pre> <pre>+ mutex_unlock(&hive->hive_lock);<o:p></o:p></pre> <pre>+ amdgpu_put_xgmi_hive(hive);<o:p></o:p></pre> <pre>+ }<o:p></o:p></pre> <pre>+}<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre> static enum drm_gpu_sched_stat amdgpu_job_timedout(struct<o:p></o:p></pre> </blockquote> <pre>drm_sched_job *s_job)<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> {<o:p></o:p></pre> <pre>@@ -48,6 +103,7 @@ static enum drm_gpu_sched_stat<o:p></o:p></pre> </blockquote> <pre>amdgpu_job_timedout(struct drm_sched_job *s_job)<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> return DRM_GPU_SCHED_STAT_ENODEV;<o:p></o:p></pre> <pre> }<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>+ amdgpu_job_core_dump(adev, job);<o:p></o:p></pre> </blockquote> </blockquote> </blockquote> </blockquote> </blockquote> Let's call this after adev->job_hang = true;<o:p></o:p> regards Sunil<o:p></o:p> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre><o:p> </o:p></pre> </blockquote> <pre>The philosophy is hang and recovery is to let the HW and software try to<o:p></o:p></pre> <pre>recover. Here we try to do a soft recovery first and i think we should wait for<o:p></o:p></pre> <pre>seft recovery and if fails then we do dump and thats exactly we are doing here.<o:p></o:p></pre> </blockquote> <pre>Hi Sunil ,<o:p></o:p></pre> <pre>thanks for the suggestion, and that's reasonable. But my concern is that after soft recovery happened, the GPU's status may change(take gfx 9 for example, it will try to kill the current hang wave)<o:p></o:p></pre> <pre> Actually, in most cases, a real shader hang cannot be resolved through soft recovery, and at that moment, we need to get a very close dump/snapshot/representation of GPU's current error status.<o:p></o:p></pre> <pre>Just like the scandump, when we trying to do a scandump for a shader hang, we will disable gpu_recovery, and no soft recovery/per-queue reset/HW reset will happen before the scandump, right?<o:p></o:p></pre> <pre>On most products, there are no scandump interfaces, so core dump is even more important for debugging GPU hang issue.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Regards,<o:p></o:p></pre> <pre>Trigger<o:p></o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>Also we need to make sure that the tasks which are already in queue are put<o:p></o:p></pre> <pre>on hold and the their sync points are signalled before we dump.<o:p></o:p></pre> <pre>check once what all steps are taken before we dump in the current<o:p></o:p></pre> <pre>implementation.<o:p></o:p></pre> </blockquote> <pre>Do you mean sometimes like:<o:p></o:p></pre> <pre> drm_sched_wqueue_stop(&ring->sched);<o:p></o:p></pre> <pre> amdgpu_fence_driver_force_completion(ring); // Since there is no GPU reset happened, is it reasonable to call it here?<o:p></o:p></pre> <pre> amdgpu_job_core_dump(adev, job);<o:p></o:p></pre> </blockquote> <pre>Before hard reset we do all those things. But i guess we may not need<o:p></o:p></pre> <pre>that in case we capturing before soft reset itself but you need to test<o:p></o:p></pre> <pre>it, to see the dump values are true or not.<o:p></o:p></pre> <pre>Also apart from hardware state we dump a lot of other information like<o:p></o:p></pre> <pre>ring buffers and in case jobs are still submitting we might be not able<o:p></o:p></pre> <pre>to get the right data as the ring might be in use and being<o:p></o:p></pre> <pre>consumed/filled up that time and that's why scheduler stop helps. But in<o:p></o:p></pre> <pre>case soft reset is successful we do not want to do that.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>So here is what i think but Alex please suggest if it make sense.<o:p></o:p></pre> <pre>If recovery is disabled : Capture ip dump before soft reset. (Give close<o:p></o:p></pre> <pre>register state but ring buffer need to be seen as it is in use as<o:p></o:p></pre> <pre>scheduler is running)<o:p></o:p></pre> <pre>if recovery is enabled : capture ip dump (Current implementation make<o:p></o:p></pre> <pre>sure to disable drm sched and fence time out)<o:p></o:p></pre> </blockquote> <pre><o:p> </o:p></pre> <pre>I think Trigger's proposed logic makes sense. I don't see a reason<o:p></o:p></pre> <pre>not to dump earlier if we can. I don't really see what it buys us to<o:p></o:p></pre> <pre>have different behavior depending on whether or not recovery is<o:p></o:p></pre> <pre>enabled. If per queue resets are successful, then we'd miss the dump<o:p></o:p></pre> <pre>in that case. If we only dump in job_timedout, then we'd miss the<o:p></o:p></pre> <pre>dump when there is a reset due to something the kernel driver or KFD<o:p></o:p></pre> <pre>has done.<o:p></o:p></pre> </blockquote> I am totally in for dumping the ip registers as early as possible and its a good idea to dump before soft reset. I am little worried if we need to stop scheduling any further jobs and process the ones already scheduled. If that's not a problem it looks good to me. <o:p></o:p> <o:p> </o:p> With one minor change above patch looks fine to me. Just make your try it and see the time it takes between "Dumping IP State" and "Dumping IP State Completed" i.e reading registers.<o:p></o:p> Acked-by: Sunil Khatri <a href="mailto:sunil.khatri@amd.com"> <sunil.khatri@amd.com></a><o:p></o:p> Regards Sunil K<o:p></o:p> <o:p></o:p> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>I think what we want is for job_timedout or the similar logic on the<o:p></o:p></pre> <pre>KFD side to call the coredump code when we detect a stuck queue.<o:p></o:p></pre> <pre>We'll need to keep the codedump code in gpu_recover to cover the cases<o:p></o:p></pre> <pre>where we need to reset due to something outside of the user submission<o:p></o:p></pre> <pre>paths.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Alex<o:p></o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre><o:p> </o:p></pre> <pre>function ptr print ip state could be called to capture dump when its<o:p></o:p></pre> <pre>needed in both above cases. Right now print is called when dump is<o:p></o:p></pre> <pre>actually dumped which is when data file which is generated in<o:p></o:p></pre> <pre>devcoredump is read.<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Regards<o:p></o:p></pre> <pre>Sunil Khatri<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>Regards,<o:p></o:p></pre> <pre>Trigger<o:p></o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>Regards<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>Sunil khatri<o:p></o:p></pre> <pre><o:p> </o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> adev->job_hang = true;<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>@@ -101,6 +157,12 @@ static enum drm_gpu_sched_stat<o:p></o:p></pre> </blockquote> <pre>amdgpu_job_timedout(struct drm_sched_job *s_job)<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> reset_context.src = AMDGPU_RESET_SRC_JOB;<o:p></o:p></pre> <pre> clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags);<o:p></o:p></pre> <pre><o:p> </o:p></pre> <pre>+ /*<o:p></o:p></pre> <pre>+ * To avoid an unnecessary extra coredump, as we have<o:p></o:p></pre> </blockquote> <pre>already<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre>+ * got the very close representation of GPU's error status<o:p></o:p></pre> <pre>+ */<o:p></o:p></pre> <pre>+ set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags);<o:p></o:p></pre> <pre>+<o:p></o:p></pre> <pre> r = amdgpu_device_gpu_recover(ring->adev, job,<o:p></o:p></pre> </blockquote> <pre>&reset_context);<o:p></o:p></pre> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <pre> if (r)<o:p></o:p></pre> <pre> dev_err(adev->dev, "GPU Recovery Failed: %d\n", r);<o:p></o:p></pre> </blockquote> </blockquote> </blockquote> </blockquote> </blockquote> </div> </div> </div> </body> </html>