[PATCH] drm/amdgpu: Don't enable LTR if not supported
Lazar, Lijo
lijo.lazar at amd.com
Fri Sep 9 07:41:54 UTC 2022
On 9/8/2022 11:27 PM, Bjorn Helgaas wrote:
> On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
>> I am not sure if ASPM settings can be generalized by PCIE core.
>> Performance vs Power savings when ASPM is enabled will require some
>> additional tuning and that will be device specific.
>
> Can you elaborate on this? In the universe of drivers, very few do
> their own ASPM configuration, and it's usually to work around hardware
> defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
> work on some iwlwifi devices, etc.
>
> The core does know how to configure all the ASPM features defined in
> the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.
>
>> In some of the other ASICs, this programming is done in VBIOS/SBIOS
>> firmware. Having it in driver provides the advantage of additional
>> tuning without forcing a VBIOS upgrade.
>
> I think it's clearly the intent of the PCIe spec that ASPM
> configuration be done by generic code. Here are some things that
> require a system-level view, not just an individual device view:
>
> - L0s, L1, and L1 Substates cannot be enabled unless both ends
> support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).
>
> - Devices advertise the "Acceptable Latency" they can accept for
> transitions from L0s or L1 to L0, and the actual latency depends
> on the "Exit Latencies" of all the devices in the path to the Root
> Port (sec 5.4.1.3.2).
>
> - LTR (required by L1.2) cannot be enabled unless it is already
> enabled in all upstream devices (sec 6.18). This patch relies on
> "ltr_path", which works now but relies on the PCI core never
> reconfiguring the upstream path.
>
> There might be amdgpu-specific features the driver needs to set up,
> but if drivers fiddle with architected features like LTR behind the
> PCI core's back, things are likely to break.
>
The programming is mostly related to entry conditions and spec leaves it
to implementation.
From r4.0 spec -
"
This specification does not dictate when a component with an Upstream
Port must initiate a transition to the L1 state. The interoperable
mechanisms for transitioning into and out of L1 are defined within this
specification; however, the specific ASPM policy governing when to
transition into L1 is left to the implementer.
...
Another approach would be for the Downstream device to initiate a
transition to the L1 state once the Link has been idle in L0 for a set
amount of time.
"
Some of the programming like below relates to timings for entry.
def = data = RREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3);
data |= 0x5DE0 <<
RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT;
data |= 0x0010 <<
RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT;
if (def != data)
WREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3, data);
Similarly for LTR, as it provides a dynamic mechanism to report
tolerance while in L1 substates, the tolerance timings can be tuned
through registers though there is a threshold.
Regardless, Alex is already checking with hardware design team on
possible improvements.
Thanks,
Lijo
>> From: Alex Deucher <alexdeucher at gmail.com>
>> On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
>>> Do you know why the driver configures ASPM itself? If the PCI core is
>>> doing something wrong (and I'm sure it is, ASPM support is kind of a
>>> mess), I'd much prefer to fix up the core where *all* drivers can
>>> benefit from it.
>>
>> This is the programming sequence we get from our hardware team and it
>> is used on both windows and Linux. As far as I understand it windows
>> doesn't handle this in the core, it's up to the individual drivers to
>> enable it. I'm not familiar with how this should be enabled
>> generically, but at least for our hardware, it seems to have some
>> variation compared to what is done in the PCI core due to stability,
>> etc. It seems to me that this may need asic specific implementations
>> for a lot of hardware depending on the required programming sequences.
>> E.g., various asics may need hardware workaround for bugs or platform
>> issues, etc. I can ask for more details from our hardware team.
>
> If the PCI core has stability issues, I want to fix them. This
> hardware may have its own stability issues, and I would ideally like
> to have drivers use interfaces like pci_disable_link_state() to avoid
> broken things. Maybe we need new interfaces for more subtle kinds of
> breakage.
>
> Bjorn
>
More information about the amd-gfx
mailing list