[PATCH] drm/amdgpu: Skip the coredump collection on reset during driver reload

Lazar, Lijo lijo.lazar at amd.com
Wed Apr 17 03:37:11 UTC 2024



On 4/17/2024 12:05 AM, Ahmad Rehman wrote:
> In passthrough environment, the driver triggers the mode-1 reset on
> reload. The reset causes the core dump collection which is delayed task
> and prevents driver from unloading until it is completed. Since we do
> not need to collect data on "reset on reload" case, we can skip core
> dump collection.
> 
> Signed-off-by: Ahmad Rehman <Ahmad.Rehman at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1b2e177bc2d6..b4a41f075512 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5430,7 +5430,8 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
>  
>  				vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
>  
> -				amdgpu_coredump(tmp_adev, vram_lost, reset_context);
> +				if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags))

In addition, use this flag earlier and avoid calling
"amdgpu_reset_reg_dumps" based on the flag.

Thanks,
Lijo

> +					amdgpu_coredump(tmp_adev, vram_lost, reset_context);
>  
>  				if (vram_lost) {
>  					DRM_INFO("VRAM is lost due to GPU reset!\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6ea893ad9a36..c512f70b8272 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2481,6 +2481,7 @@ static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work)
>  
>  	/* Use a common context, just need to make sure full reset is done */
>  	set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags);
> +	set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags);
>  	r = amdgpu_do_asic_reset(&device_list, &reset_context);
>  
>  	if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> index 66125d43cf21..b11d190ece53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
> @@ -32,6 +32,7 @@ enum AMDGPU_RESET_FLAGS {
>  
>  	AMDGPU_NEED_FULL_RESET = 0,
>  	AMDGPU_SKIP_HW_RESET = 1,
> +	AMDGPU_SKIP_COREDUMP = 2,
>  };
>  
>  struct amdgpu_reset_context {


More information about the amd-gfx mailing list