[PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list

Alex Deucher alexdeucher at gmail.com
Thu Jan 13 17:54:50 UTC 2022


On Wed, Jan 12, 2022 at 3:36 AM Zhou1, Tao <Tao.Zhou1 at amd.com> wrote:
>
> [AMD Official Use Only]
>
>
>
> > -----Original Message-----
> > From: Chai, Thomas <YiPeng.Chai at amd.com>
> > Sent: Wednesday, January 12, 2022 3:48 PM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Chai, Thomas <YiPeng.Chai at amd.com>; Zhang, Hawking
> > <Hawking.Zhang at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Clements,
> > John <John.Clements at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
> > Subject: [PATCH 2/2] drm/amdgpu: No longer insert ras blocks into ras_list if it
> > already exists in ras_list
> >
> > No longer insert ras blocks into ras_list if it already exists in ras_list.
> >
> > Signed-off-by: yipechai <YiPeng.Chai at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index 62be0b4909b3..e6d3bb4b56e4 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -2754,9 +2754,17 @@ int amdgpu_ras_reset_gpu(struct amdgpu_device
> > *adev)  int amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
> >               struct amdgpu_ras_block_object* ras_block_obj)  {
> > +     struct amdgpu_ras_block_object *obj, *tmp;
> >       if (!adev || !amdgpu_ras_asic_supported(adev) || !ras_block_obj)
> >               return -EINVAL;
> >
> > +     /* If the ras object had been in ras_list, doesn't add it to ras_list again */
> [Tao] How about "If the ras object is in ras_list, don't add it again"
>
> > +     list_for_each_entry_safe(obj, tmp, &adev->ras_list, node) {
> > +             if (obj == ras_block_obj) {
> > +                     return 0;
> > +             }
> > +     }
>
> [Tao] The patch is OK for me currently, but I think the root cause is we initialize adev->gmc.xgmi.ras in gmc_ras_late_init, the initialization should be called only in modprobe stage and we can create a general gmc_early_init for it.

Yes, please fix the root cause.  We should only be adding the blocks
once.  This is just papering over the actual problem.

Alex


>
> > +
> >       INIT_LIST_HEAD(&ras_block_obj->node);
> >       list_add_tail(&ras_block_obj->node, &adev->ras_list);
> >
> > --
> > 2.25.1


More information about the amd-gfx mailing list