amdgpu vs kexec
Alex Deucher
alexdeucher at gmail.com
Wed Jun 18 13:46:27 UTC 2025
On Wed, Jun 18, 2025 at 9:41 AM Mario Limonciello <superm1 at kernel.org> wrote:
>
> On 6/18/2025 4:05 AM, Christian König wrote:
> > On 6/18/25 10:51, Peter Zijlstra wrote:
> >> On Tue, Jun 17, 2025 at 09:12:12PM -0500, Mario Limonciello wrote:
> >>
> >>> How about if we reset before the kexec? There is a symbol for drivers to
> >>> use to know they're about to go through kexec to do $THINGS.
> >>>
> >>> Something like this:
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> index 0fc0eeedc6461..2b1216b14d618 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> @@ -34,6 +34,7 @@
> >>>
> >>> #include <linux/cc_platform.h>
> >>> #include <linux/dynamic_debug.h>
> >>> +#include <linux/kexec.h>
> >>> #include <linux/module.h>
> >>> #include <linux/mmu_notifier.h>
> >>> #include <linux/pm_runtime.h>
> >>> @@ -2544,6 +2545,9 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
> >>> adev->mp1_state = PP_MP1_STATE_UNLOAD;
> >>> amdgpu_device_ip_suspend(adev);
> >>> adev->mp1_state = PP_MP1_STATE_NONE;
> >>> +
> >>> + if (kexec_in_progress)
> >>> + amdgpu_asic_reset(adev);
> >>> }
> >>>
> >>> static int amdgpu_pmops_prepare(struct device *dev)
> >>
> >> I will throw this in the dev kernel... I'll let you know.
> >
> > Mhm if the drivers are informed about the kexec
>
> It looks like PeterZ found the symbol isn't exported; but that's not to
> say it "can't be" if it fixes this issue.
>
> > then we could also send the unload/reset packet only to the PSP IIRC.
> >
> > That might have a better chance of succeeding than a full ASIC reset.
> >
> > Lijo should know more about that.
> >
> > Regards,
> > Christian.
>
> Another idea is to do a FLR on the way down.
I think you want something like:
r = amdgpu_dpm_set_mp1_state(adev, PP_MP1_STATE_UNLOAD);
Alex
More information about the amd-gfx
mailing list