[PATCH] drm/amdgpu: Don't enable LTR if not supported
Bjorn Helgaas
helgaas at kernel.org
Fri Sep 9 19:55:43 UTC 2022
On Fri, Sep 09, 2022 at 01:11:54PM +0530, Lazar, Lijo wrote:
>
>
> On 9/8/2022 11:27 PM, Bjorn Helgaas wrote:
> > On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
> > > I am not sure if ASPM settings can be generalized by PCIE core.
> > > Performance vs Power savings when ASPM is enabled will require some
> > > additional tuning and that will be device specific.
> >
> > Can you elaborate on this? In the universe of drivers, very few do
> > their own ASPM configuration, and it's usually to work around hardware
> > defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
> > work on some iwlwifi devices, etc.
> >
> > The core does know how to configure all the ASPM features defined in
> > the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.
> >
> > > In some of the other ASICs, this programming is done in VBIOS/SBIOS
> > > firmware. Having it in driver provides the advantage of additional
> > > tuning without forcing a VBIOS upgrade.
> >
> > I think it's clearly the intent of the PCIe spec that ASPM
> > configuration be done by generic code. Here are some things that
> > require a system-level view, not just an individual device view:
> >
> > - L0s, L1, and L1 Substates cannot be enabled unless both ends
> > support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).
> >
> > - Devices advertise the "Acceptable Latency" they can accept for
> > transitions from L0s or L1 to L0, and the actual latency depends
> > on the "Exit Latencies" of all the devices in the path to the Root
> > Port (sec 5.4.1.3.2).
> >
> > - LTR (required by L1.2) cannot be enabled unless it is already
> > enabled in all upstream devices (sec 6.18). This patch relies on
> > "ltr_path", which works now but relies on the PCI core never
> > reconfiguring the upstream path.
> >
> > There might be amdgpu-specific features the driver needs to set up,
> > but if drivers fiddle with architected features like LTR behind the
> > PCI core's back, things are likely to break.
> >
>
> The programming is mostly related to entry conditions and spec leaves it to
> implementation.
>
> From r4.0 spec -
> "
> This specification does not dictate when a component with an Upstream Port
> must initiate a transition to the L1 state. The interoperable mechanisms for
> transitioning into and out of L1 are defined within this specification;
> however, the specific ASPM policy governing when to transition into L1 is
> left to the implementer.
> ...
> Another approach would be for the Downstream device to initiate a transition
> to the L1 state once the Link has been idle in L0 for a set amount of time.
> "
>
> Some of the programming like below relates to timings for entry.
>
> def = data = RREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3);
> data |= 0x5DE0 <<
> RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT;
> data |= 0x0010 <<
> RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT;
> if (def != data)
> WREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3, data);
>
> Similarly for LTR, as it provides a dynamic mechanism to report tolerance
> while in L1 substates, the tolerance timings can be tuned through registers
> though there is a threshold.
I don't object to the driver programming device-specific things,
although there might be issues if it does that after the core has
already configured and enabled ASPM -- the driver might need to
temporarily disable ASPM while it updates parameters, then re-enable
it.
I *do* object to the driver programming PCIe-generic things that the
PCI core thinks it owns. It's especially annoying if the driver uses
device-specific #defines and access methods for generic PCIe things
because then we can't even find potential conflicts.
> > > From: Alex Deucher <alexdeucher at gmail.com>
> > > On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> >
> > > > Do you know why the driver configures ASPM itself? If the PCI core is
> > > > doing something wrong (and I'm sure it is, ASPM support is kind of a
> > > > mess), I'd much prefer to fix up the core where *all* drivers can
> > > > benefit from it.
> > >
> > > This is the programming sequence we get from our hardware team and it
> > > is used on both windows and Linux. As far as I understand it windows
> > > doesn't handle this in the core, it's up to the individual drivers to
> > > enable it. I'm not familiar with how this should be enabled
> > > generically, but at least for our hardware, it seems to have some
> > > variation compared to what is done in the PCI core due to stability,
> > > etc. It seems to me that this may need asic specific implementations
> > > for a lot of hardware depending on the required programming sequences.
> > > E.g., various asics may need hardware workaround for bugs or platform
> > > issues, etc. I can ask for more details from our hardware team.
> >
> > If the PCI core has stability issues, I want to fix them. This
> > hardware may have its own stability issues, and I would ideally like
> > to have drivers use interfaces like pci_disable_link_state() to avoid
> > broken things. Maybe we need new interfaces for more subtle kinds of
> > breakage.
> >
> > Bjorn
> >
More information about the amd-gfx
mailing list