[PATCH] drm/amdgpu: reset asic after system-wide suspend aborted

Alex Deucher alexdeucher at gmail.com
Mon Nov 22 15:47:51 UTC 2021


On Mon, Nov 22, 2021 at 9:23 AM Liang, Prike <Prike.Liang at amd.com> wrote:
>
> [Public]
>
> > -----Original Message-----
> > From: Alex Deucher <alexdeucher at gmail.com>
> > Sent: Friday, November 19, 2021 12:18 AM
> > To: Lazar, Lijo <Lijo.Lazar at amd.com>
> > Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Christian König
> > <ckoenig.leichtzumerken at gmail.com>; Liang, Prike <Prike.Liang at amd.com>;
> > Huang, Ray <Ray.Huang at amd.com>; amd-gfx at lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
> > aborted
> >
> > On Thu, Nov 18, 2021 at 10:01 AM Lazar, Lijo <Lijo.Lazar at amd.com> wrote:
> > >
> > > [Public]
> > >
> > >
> > > BTW, I'm not sure if 'reset always' on resume is a good idea  for GPUs in a
> > hive (assuming those systems also get suspended and get hiccups). At this
> > point the hive isn't reinitialized.
> >
> > Yeah, we should probably not reset if we are part of a hive.
> >
> > Alex
> >
> For the GPU hive reset in this suspend abort case need treat specially, does that because of GPU hive need take care each node reset dependence and synchronous reset? For this purpose, can we skip the hive reset case and only do GPU reset under adev->gmc.xgmi.num_physical_nodes == 0 ?

Yes, exactly.  For the aborted suspend reset, we can check the value
before doing a reset.  I think you want to check if
adev->gmc.xgmi.num_physical_nodes <= 1.

Alex

>
> > >
> > > Thanks,
> > > Lijo


More information about the amd-gfx mailing list