回复: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier
Yang, Stanley
Stanley.Yang at amd.com
Fri Nov 26 13:07:45 UTC 2021
[AMD Official Use Only]
It's not necessary, because before hw fini, all ras features have been disabled and con->features is set to zero.
Regards,
Stanley
> -----邮件原件-----
> 发件人: Zhang, Hawking <Hawking.Zhang at amd.com>
> 发送时间: Friday, November 26, 2021 8:57 PM
> 收件人: Yang, Stanley <Stanley.Yang at amd.com>; amd-
> gfx at lists.freedesktop.org; Clements, John <John.Clements at amd.com>;
> Zhou1, Tao <Tao.Zhou1 at amd.com>; Li, Candice <Candice.Li at amd.com>;
> Chai, Thomas <YiPeng.Chai at amd.com>
> 抄送: Yang, Stanley <Stanley.Yang at amd.com>
> 主题: RE: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> when unload drvier
>
> [AMD Official Use Only]
>
> Good catch. We still need to release ras object in the end. Any reason the
> sequence was removed?
>
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
>
> WARN(con->features, "Feature mask is not cleared");
>
> - if (con->features)
> - amdgpu_ras_disable_all_features(adev, 1);
> -
> cancel_delayed_work_sync(&con->ras_counte_delay_work);
>
> Regards,
> Hawking
>
> -----Original Message-----
> From: Stanley.Yang <Stanley.Yang at amd.com>
> Sent: Friday, November 26, 2021 17:48
> To: amd-gfx at lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang at amd.com>; Clements, John <John.Clements at amd.com>;
> Zhou1, Tao <Tao.Zhou1 at amd.com>; Li, Candice <Candice.Li at amd.com>;
> Chai, Thomas <YiPeng.Chai at amd.com>
> Cc: Yang, Stanley <Stanley.Yang at amd.com>
> Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed
> when unload drvier
>
> Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw,
> so ras ta will unload before send ras disable command, ras dsiable operation
> must before hw fini.
>
> Signed-off-by: Stanley.Yang <Stanley.Yang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ----
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 73ec46140d68..d5e642e90010 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct
> amdgpu_device *adev)
> if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
> amdgpu_virt_release_ras_err_handler_data(adev);
>
> - amdgpu_ras_pre_fini(adev);
> -
> if (adev->gmc.xgmi.num_physical_nodes > 1)
> amdgpu_xgmi_remove_device(adev);
>
> @@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct
> amdgpu_device *adev)
>
> amdgpu_fbdev_fini(adev);
>
> + /* disable ras feature must before hw fini */
> + amdgpu_ras_pre_fini(adev);
> +
> amdgpu_device_ip_fini_early(adev);
>
> amdgpu_irq_fini_hw(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 39dfd4d59881..65102d2a0a98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device
> *adev,
> amdgpu_ras_sysfs_remove(adev, ras_block);
> if (ih_info->cb)
> amdgpu_ras_interrupt_remove_handler(adev, ih_info);
> - amdgpu_ras_feature_enable(adev, ras_block, 0);
> }
>
> /* do some init work after IP late init as dependence.
> @@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
>
> WARN(con->features, "Feature mask is not cleared");
>
> - if (con->features)
> - amdgpu_ras_disable_all_features(adev, 1);
> -
> cancel_delayed_work_sync(&con->ras_counte_delay_work);
>
> amdgpu_ras_set_context(adev, NULL);
> --
> 2.17.1
More information about the amd-gfx
mailing list