[PATCH v2] drm/amd: Add pre-zen AMD hardware to PCIe dynamic switching exclusions

Mario Limonciello superm1 at kernel.org
Thu Apr 3 19:13:24 UTC 2025


On 4/3/2025 10:48 AM, Alex Deucher wrote:
> On Wed, Apr 2, 2025 at 11:12 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>
>> From: Mario Limonciello <mario.limonciello at amd.com>
>>
>> AMD RX580 when added AMD Phenom 2 has problems with overheating. This is due to
> 
> I don't think this is entirely accurate.  I think the GPU gets hot
> because the device hangs due to a problem with changing the PCIe
> clocks.
> 
>> changes with PCIe dynamic switching introduced by commit 466a7d115326e
>> ("drm/amd: Use the first non-dGPU PCI device for BW limits").
>>
>> To avoid risks of other issues with old hardware require at least Zen hardware
>> for AMD side to enable PCIe dynamic switching.
> 
> I'm pretty sure PCIe reclocking worked on pre-Zen hardware.  We've
> supported this on our GPUs going back at least 15 or more years.  I
> suspect the actual problem is that some links may not reliably train
> at the full bandwidth on some motherboards.  Forcing a higher link
> speed may cause problems.  

That seems odd to me it would advertise a higher link speed than it 
could train at.

> Maybe it would be better to limit the max
> PCIe link rate to whatever the link is currently trained to.  IIRC,
> PCIe links will train at the fastest link possible by default.  The
> previous behavior was to limit the max clock to the slowest link in
> the topology to save power, but then we changed it to use the fastest
> link possible based on the PCIe link caps.  Perhaps limiting it to the
> fastest currently trained link rate would be better.

I mean that's essentially what happens when 
amdgpu_device_pcie_dynamic_switching_supported() returns that it doesn't 
work.

If your theory is right; maybe what we really need is a pile of DMI 
quirks for M/B that are having this problem.

> 
> Alex
> 
>>
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4098
>> Fixes: 466a7d115326e ("drm/amd: Use the first non-dGPU PCI device for BW limits")
>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>> ---
>> v2:
>>   * Cover more hardware
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a30111d2c3ea0..caa44ee788c8f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1854,6 +1854,9 @@ bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev)
>>    *
>>    * https://edc.intel.com/content/www/us/en/design/products/platforms/details/raptor-lake-s/13th-generation-core-processors-datasheet-volume-1-of-2/005/pci-express-support/
>>    * https://gitlab.freedesktop.org/drm/amd/-/issues/2663
>> + *
>> + * AMD Phenom II X6 1090T has a similar issue
>> + * https://gitlab.freedesktop.org/drm/amd/-/issues/4098
>>    */
>>   static bool amdgpu_device_pcie_dynamic_switching_supported(struct amdgpu_device *adev)
>>   {
>> @@ -1866,6 +1869,8 @@ static bool amdgpu_device_pcie_dynamic_switching_supported(struct amdgpu_device
>>
>>          if (c->x86_vendor == X86_VENDOR_INTEL)
>>                  return false;
>> +       if (c->x86_vendor == X86_VENDOR_AMD && !cpu_feature_enabled(X86_FEATURE_ZEN))
>> +               return false;
>>   #endif
>>          return true;
>>   }
>> --
>> 2.43.0
>>



More information about the amd-gfx mailing list