[PATCH] drm/amdgpu: Don't enable LTR if not supported
Alex Deucher
alexdeucher at gmail.com
Thu Sep 8 18:43:39 UTC 2022
On Thu, Sep 8, 2022 at 1:57 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
> > I am not sure if ASPM settings can be generalized by PCIE core.
> > Performance vs Power savings when ASPM is enabled will require some
> > additional tuning and that will be device specific.
>
> Can you elaborate on this? In the universe of drivers, very few do
> their own ASPM configuration, and it's usually to work around hardware
> defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
> work on some iwlwifi devices, etc.
>
> The core does know how to configure all the ASPM features defined in
> the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.
>
> > In some of the other ASICs, this programming is done in VBIOS/SBIOS
> > firmware. Having it in driver provides the advantage of additional
> > tuning without forcing a VBIOS upgrade.
>
> I think it's clearly the intent of the PCIe spec that ASPM
> configuration be done by generic code. Here are some things that
> require a system-level view, not just an individual device view:
>
> - L0s, L1, and L1 Substates cannot be enabled unless both ends
> support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).
>
> - Devices advertise the "Acceptable Latency" they can accept for
> transitions from L0s or L1 to L0, and the actual latency depends
> on the "Exit Latencies" of all the devices in the path to the Root
> Port (sec 5.4.1.3.2).
>
> - LTR (required by L1.2) cannot be enabled unless it is already
> enabled in all upstream devices (sec 6.18). This patch relies on
> "ltr_path", which works now but relies on the PCI core never
> reconfiguring the upstream path.
>
> There might be amdgpu-specific features the driver needs to set up,
> but if drivers fiddle with architected features like LTR behind the
> PCI core's back, things are likely to break.
>
> > From: Alex Deucher <alexdeucher at gmail.com>
> > On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> > > Do you know why the driver configures ASPM itself? If the PCI core is
> > > doing something wrong (and I'm sure it is, ASPM support is kind of a
> > > mess), I'd much prefer to fix up the core where *all* drivers can
> > > benefit from it.
> >
> > This is the programming sequence we get from our hardware team and it
> > is used on both windows and Linux. As far as I understand it windows
> > doesn't handle this in the core, it's up to the individual drivers to
> > enable it. I'm not familiar with how this should be enabled
> > generically, but at least for our hardware, it seems to have some
> > variation compared to what is done in the PCI core due to stability,
> > etc. It seems to me that this may need asic specific implementations
> > for a lot of hardware depending on the required programming sequences.
> > E.g., various asics may need hardware workaround for bugs or platform
> > issues, etc. I can ask for more details from our hardware team.
>
> If the PCI core has stability issues, I want to fix them. This
> hardware may have its own stability issues, and I would ideally like
> to have drivers use interfaces like pci_disable_link_state() to avoid
> broken things. Maybe we need new interfaces for more subtle kinds of
> breakage.
I'm not sure what, if anything is wrong with the current generic PCIe
ASPM code in Linux. I was speaking more from a hardware validation
standpoint. E.g., our silicon validation and hardware teams run a lot
of tests on a bunch of platforms and tune the programming sequence for
speed/power/stability. Then they hand the programming sequence off to
the software teams as sort of a golden config or rule set for ASPM
enablement in the OS for each device. I'm not exactly sure how far
these sequences stray from what the core PCI code does. Will try and
find out more.
Alex
More information about the amd-gfx
mailing list