[PATCH] drm/amdgpu: only save and restore GPU device config at GPU error
Liang, Prike
Prike.Liang at amd.com
Fri Jul 28 02:21:33 UTC 2023
[Public]
Hi, Lijo
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Thursday, July 27, 2023 6:49 PM
> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: only save and restore GPU device config
> at GPU error
>
>
>
> On 7/27/2023 3:20 PM, Prike Liang wrote:
> > There's need a check on the GPU error state before save and restore
> > GPU device config space.
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 358dcc1070c5..5ef3c5c49bee 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3946,7 +3946,8 @@ int amdgpu_device_init(struct amdgpu_device
> *adev,
> > dev_err(adev->dev, "amdgpu_pmu_init failed\n");
> >
> > /* Have stored pci confspace at hand for restore in sudden PCI error
> */
> > - if (amdgpu_device_cache_pci_state(adev->pdev))
> > + if (adev->pdev->error_state != pci_channel_io_normal &&
> > + amdgpu_device_cache_pci_state(adev->pdev))
>
> We need the clean state to be cached, not the state when there is an error.
> This state is later used to restore later, say when a mode-2 reset happens.
>
> Thanks,
> Lijo
>
But the original code in the amdgpu_device_init () is trying to restore the GPU config space immediately after the GPU config space saved, so
that looks this GPU save state restored unconditionally and that's not only for the GPU error case? Meanwhile, as to the mode-2 or some other reset case the reset
function will do the save and restore GPU state separately and not use the GPU state saved in amdgpu_device_init() process. In the amdgpu_device_init(), how
about restore the GPU state only for the GPU error case?
> > pci_restore_state(pdev);
> >
> > /* if we have > 1 VGA cards, then disable the amdgpu VGA resources
> > */
More information about the amd-gfx
mailing list