[PATCH v4 0/2] Improve the dev coredump for gfx job timeout scenario
Trigger.Huang at amd.com
Trigger.Huang at amd.com
Wed Aug 21 08:38:39 UTC 2024
From: Trigger Huang <Trigger.Huang at amd.com>
The current dev coredump implementation sometimes cannot fully satisfy customer's requirements due to:
1, dev coredump is called in GPU reset function, so if GPU reset is disabled, the dev coredump is also disabled
2, When job timeout happened, the dump GPU status will be happened after a lot of operations, like soft_reset. The concern here is that the status is not so close to the real GPU's error status
The new solution will unconditionally call dev coredump immediately after a job timeout to get a closer representation of GPU's error status
Trigger Huang (2):
drm/amdgpu: skip printing vram_lost if needed
drm/amdgpu: Do core dump immediately when job tmo
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 20 +++---
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 68 ++++++++++++++++++-
4 files changed, 82 insertions(+), 15 deletions(-)
--
2.34.1
More information about the amd-gfx
mailing list