[PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload
Lazar, Lijo
lijo.lazar at amd.com
Wed Apr 17 04:31:07 UTC 2024
On 4/17/2024 9:43 AM, Ahmad Rehman wrote:
> In passthrough environment, the driver triggers the mode-1 reset on
> reload. The reset causes the core dump collection which is delayed task
> and prevents driver from unloading until it is completed. Since we do
> not need to collect data on "reset on reload" case, we can skip core
> dump collection.
>
> v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well.
>
> Signed-off-by: Ahmad Rehman <Ahmad.Rehman at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 +
> 3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1b2e177bc2d6..c718982cffa8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5357,7 +5357,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
> /* Try reset handler method first */
> tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
> reset_list);
> - amdgpu_reset_reg_dumps(tmp_adev);
> +
> + if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags))
> + amdgpu_reset_reg_dumps(tmp_adev);
This may be saved to some bool. Anyway,
Reviewed-by: Lijo Lazar <lijo.lazar at amd.com>
Thanks,
Lijo
>
> reset_context->reset_device_list = device_list_handle;
> r = amdgpu_reset_perform_reset(tmp_adev, reset_context);
> @@ -5430,7 +5432,8 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
>
> vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
>
> - amdgpu_coredump(tmp_adev, vram_lost, reset_context);
> + if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags))
> + amdgpu_coredump(tmp_adev, vram_lost, reset_context);
>
> if (vram_lost) {
> DRM_INFO("VRAM is lost due to GPU reset!\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6ea893ad9a36..c512f70b8272 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2481,6 +2481,7 @@ static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
>
> /* Use a common context, just need to make sure full reset is done */
> set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags);
> + set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags);
> r = amdgpu_do_asic_reset(&device_list, &reset_context);
>
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> index 66125d43cf21..b11d190ece53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> @@ -32,6 +32,7 @@ enum AMDGPU_RESET_FLAGS {
>
> AMDGPU_NEED_FULL_RESET = 0,
> AMDGPU_SKIP_HW_RESET = 1,
> + AMDGPU_SKIP_COREDUMP = 2,
> };
>
> struct amdgpu_reset_context {
More information about the amd-gfx
mailing list