[PATCH v2] drm/amd: Add pre-zen AMD hardware to PCIe dynamic switching exclusions
Mario Limonciello
superm1 at kernel.org
Thu Apr 3 19:13:24 UTC 2025
On 4/3/2025 10:48 AM, Alex Deucher wrote:
> On Wed, Apr 2, 2025 at 11:12 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>
>> From: Mario Limonciello <mario.limonciello at amd.com>
>>
>> AMD RX580 when added AMD Phenom 2 has problems with overheating. This is due to
>
> I don't think this is entirely accurate. I think the GPU gets hot
> because the device hangs due to a problem with changing the PCIe
> clocks.
>
>> changes with PCIe dynamic switching introduced by commit 466a7d115326e
>> ("drm/amd: Use the first non-dGPU PCI device for BW limits").
>>
>> To avoid risks of other issues with old hardware require at least Zen hardware
>> for AMD side to enable PCIe dynamic switching.
>
> I'm pretty sure PCIe reclocking worked on pre-Zen hardware. We've
> supported this on our GPUs going back at least 15 or more years. I
> suspect the actual problem is that some links may not reliably train
> at the full bandwidth on some motherboards. Forcing a higher link
> speed may cause problems.
That seems odd to me it would advertise a higher link speed than it
could train at.
> Maybe it would be better to limit the max
> PCIe link rate to whatever the link is currently trained to. IIRC,
> PCIe links will train at the fastest link possible by default. The
> previous behavior was to limit the max clock to the slowest link in
> the topology to save power, but then we changed it to use the fastest
> link possible based on the PCIe link caps. Perhaps limiting it to the
> fastest currently trained link rate would be better.
I mean that's essentially what happens when
amdgpu_device_pcie_dynamic_switching_supported() returns that it doesn't
work.
If your theory is right; maybe what we really need is a pile of DMI
quirks for M/B that are having this problem.
>
> Alex
>
>>
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4098
>> Fixes: 466a7d115326e ("drm/amd: Use the first non-dGPU PCI device for BW limits")
>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>> ---
>> v2:
>> * Cover more hardware
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a30111d2c3ea0..caa44ee788c8f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1854,6 +1854,9 @@ bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev)
>> *
>> * https://edc.intel.com/content/www/us/en/design/products/platforms/details/raptor-lake-s/13th-generation-core-processors-datasheet-volume-1-of-2/005/pci-express-support/
>> * https://gitlab.freedesktop.org/drm/amd/-/issues/2663
>> + *
>> + * AMD Phenom II X6 1090T has a similar issue
>> + * https://gitlab.freedesktop.org/drm/amd/-/issues/4098
>> */
>> static bool amdgpu_device_pcie_dynamic_switching_supported(struct amdgpu_device *adev)
>> {
>> @@ -1866,6 +1869,8 @@ static bool amdgpu_device_pcie_dynamic_switching_supported(struct amdgpu_device
>>
>> if (c->x86_vendor == X86_VENDOR_INTEL)
>> return false;
>> + if (c->x86_vendor == X86_VENDOR_AMD && !cpu_feature_enabled(X86_FEATURE_ZEN))
>> + return false;
>> #endif
>> return true;
>> }
>> --
>> 2.43.0
>>
More information about the amd-gfx
mailing list