[PATCH] drm/amdgpu: Don't enable LTR if not supported

Bjorn Helgaas helgaas at kernel.org
Thu Sep 8 17:57:13 UTC 2022


On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
> I am not sure if ASPM settings can be generalized by PCIE core.
> Performance vs Power savings when ASPM is enabled will require some
> additional tuning and that will be device specific.

Can you elaborate on this?  In the universe of drivers, very few do
their own ASPM configuration, and it's usually to work around hardware
defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
work on some iwlwifi devices, etc.

The core does know how to configure all the ASPM features defined in
the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.

> In some of the other ASICs, this programming is done in VBIOS/SBIOS
> firmware. Having it in driver provides the advantage of additional
> tuning without forcing a VBIOS upgrade.

I think it's clearly the intent of the PCIe spec that ASPM
configuration be done by generic code.  Here are some things that
require a system-level view, not just an individual device view:

  - L0s, L1, and L1 Substates cannot be enabled unless both ends
    support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).

  - Devices advertise the "Acceptable Latency" they can accept for
    transitions from L0s or L1 to L0, and the actual latency depends
    on the "Exit Latencies" of all the devices in the path to the Root
    Port (sec 5.4.1.3.2).

  - LTR (required by L1.2) cannot be enabled unless it is already
    enabled in all upstream devices (sec 6.18).  This patch relies on
    "ltr_path", which works now but relies on the PCI core never
    reconfiguring the upstream path.

There might be amdgpu-specific features the driver needs to set up,
but if drivers fiddle with architected features like LTR behind the
PCI core's back, things are likely to break.

> From: Alex Deucher <alexdeucher at gmail.com>
> On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas at kernel.org> wrote:

> > Do you know why the driver configures ASPM itself?  If the PCI core is
> > doing something wrong (and I'm sure it is, ASPM support is kind of a
> > mess), I'd much prefer to fix up the core where *all* drivers can
> > benefit from it.
> 
> This is the programming sequence we get from our hardware team and it
> is used on both windows and Linux.  As far as I understand it windows
> doesn't handle this in the core, it's up to the individual drivers to
> enable it.  I'm not familiar with how this should be enabled
> generically, but at least for our hardware, it seems to have some
> variation compared to what is done in the PCI core due to stability,
> etc. It seems to me that this may need asic specific implementations
> for a lot of hardware depending on the required programming sequences.
> E.g., various asics may need hardware workaround for bugs or platform
> issues, etc.  I can ask for more details from our hardware team.

If the PCI core has stability issues, I want to fix them.  This
hardware may have its own stability issues, and I would ideally like
to have drivers use interfaces like pci_disable_link_state() to avoid
broken things.  Maybe we need new interfaces for more subtle kinds of
breakage.

Bjorn


More information about the amd-gfx mailing list