amdgpu vs kexec
Peter Zijlstra
peterz at infradead.org
Wed Jun 18 09:26:25 UTC 2025
On Wed, Jun 18, 2025 at 11:12:32AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 18, 2025 at 10:51:23AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 17, 2025 at 09:12:12PM -0500, Mario Limonciello wrote:
> >
> > > How about if we reset before the kexec? There is a symbol for drivers to
> > > use to know they're about to go through kexec to do $THINGS.
> > >
> > > Something like this:
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 0fc0eeedc6461..2b1216b14d618 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -34,6 +34,7 @@
> > >
> > > #include <linux/cc_platform.h>
> > > #include <linux/dynamic_debug.h>
> > > +#include <linux/kexec.h>
> > > #include <linux/module.h>
> > > #include <linux/mmu_notifier.h>
> > > #include <linux/pm_runtime.h>
> > > @@ -2544,6 +2545,9 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
> > > adev->mp1_state = PP_MP1_STATE_UNLOAD;
> > > amdgpu_device_ip_suspend(adev);
> > > adev->mp1_state = PP_MP1_STATE_NONE;
> > > +
> > > + if (kexec_in_progress)
> > > + amdgpu_asic_reset(adev);
> > > }
> > >
> > > static int amdgpu_pmops_prepare(struct device *dev)
> >
> > I will throw this in the dev kernel... I'll let you know.
>
> First hurdle appears to be that this symbol is not exported. I fixed
> that, but perhaps the kexec folks don't like drivers to use this?
Bah, so first kexec after a fresh reboot into a kernel carrying this has
the thing failing.
More information about the amd-gfx
mailing list