amdgpu vs kexec
Mario Limonciello
superm1 at kernel.org
Wed Jun 18 02:12:12 UTC 2025
On 6/16/2025 9:54 AM, Peter Zijlstra wrote:
> On Mon, Jun 16, 2025 at 01:51:21PM +0200, Christian König wrote:
>> Hi Peter,
>>
>> On 6/16/25 11:39, Peter Zijlstra wrote:
>>> Hi guys,
>>>
>>> My (Intel Sapphire Rapids) workstation has a RX 7800 XT and when I kexec
>>> a bunch of times, the amdgpu driver gets upset and barfs on boot.
>>
>> yeah, that is an "intentional" HW feature and yes you're certainly not
>> the first one to complain about it :(
>>
>> The PSP (platform security processor IIRC) is designed in such a way
>> that you can initialize it only once after a power cycle / hard reset
>> for security reasons (e.g. to not leak crypto keys used for digital
>> rights management etc..).
>>
>> On dGPUs we work around that manually by power cycling the ASIC when
>> that situation is detected during amdgpu load, but that unfortunately
>> doesn't work 100% reliable.
>
> Right.. hence the splats.
How about if we reset before the kexec? There is a symbol for drivers
to use to know they're about to go through kexec to do $THINGS.
Something like this:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0fc0eeedc6461..2b1216b14d618 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -34,6 +34,7 @@
#include <linux/cc_platform.h>
#include <linux/dynamic_debug.h>
+#include <linux/kexec.h>
#include <linux/module.h>
#include <linux/mmu_notifier.h>
#include <linux/pm_runtime.h>
@@ -2544,6 +2545,9 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
adev->mp1_state = PP_MP1_STATE_UNLOAD;
amdgpu_device_ip_suspend(adev);
adev->mp1_state = PP_MP1_STATE_NONE;
+
+ if (kexec_in_progress)
+ amdgpu_asic_reset(adev);
}
static int amdgpu_pmops_prepare(struct device *dev)
>
>> On APUs the situation is even worse because the PSP is shared between
>> the GPU and the CPU.
>>
>> We have forwarded such complains internally for years, but there is
>> not much else Alex and I can do about it.
>
> Oh well. Thanks for the info!
>
More information about the amd-gfx
mailing list