[PATCH 39/40] drm/amdgpu: Fix koops when accessing RAS EEPROM

Alex Deucher alexdeucher at gmail.com
Thu Jun 10 21:23:40 UTC 2021


On Tue, Jun 8, 2021 at 5:41 PM Luben Tuikov <luben.tuikov at amd.com> wrote:
>
> Debugfs RAS EEPROM files are available when
> the ASIC supports RAS, and when the debugfs is
> enabled, an also when "ras_enable" module
> parameter is set to 0. However in this case,
> we get a kernel oops when accessing some of
> the "ras_..." controls in debugfs. The reason
> for this is that struct amdgpu_ras::adev is
> unset. This commit sets it, thus enabling access
> to those facilities. Note that this facilitates
> EEPROM access and not necessarily RAS features or
> functionality.
>
> Cc: Alexander Deucher <Alexander.Deucher at amd.com>
> Cc: John Clements <john.clements at amd.com>
> Cc: Hawking Zhang <Hawking.Zhang at amd.com>
> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>

Acked-by: Alex Deucher <alexander.deucher at amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d791a360a92366..772d87701ad4a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1947,11 +1947,20 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
>         bool exc_err_limit = false;
>         int ret;
>
> -       if (adev->ras_enabled && con)
> -               data = &con->eh_data;
> -       else
> +       if (!con)
> +               return 0;
> +
> +       /* Allow access to RAS EEPROM via debugfs, when the ASIC
> +        * supports RAS and debugfs is enabled, but when
> +        * adev->ras_enabled is unset, i.e. when "ras_enable"
> +        * module parameter is set to 0.
> +        */
> +       con->adev = adev;
> +
> +       if (!adev->ras_enabled)
>                 return 0;
>
> +       data = &con->eh_data;
>         *data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO);
>         if (!*data) {
>                 ret = -ENOMEM;
> @@ -1961,7 +1970,6 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
>         mutex_init(&con->recovery_lock);
>         INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery);
>         atomic_set(&con->in_recovery, 0);
> -       con->adev = adev;
>
>         max_eeprom_records_count = amdgpu_ras_eeprom_max_record_count();
>         amdgpu_ras_validate_threshold(adev, max_eeprom_records_count);
> --
> 2.32.0
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list