[PATCH] drm/amdgpu: Don't enable LTR if not supported

Lazar, Lijo lijo.lazar at amd.com
Fri Sep 9 07:41:54 UTC 2022



On 9/8/2022 11:27 PM, Bjorn Helgaas wrote:
> On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
>> I am not sure if ASPM settings can be generalized by PCIE core.
>> Performance vs Power savings when ASPM is enabled will require some
>> additional tuning and that will be device specific.
> 
> Can you elaborate on this?  In the universe of drivers, very few do
> their own ASPM configuration, and it's usually to work around hardware
> defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
> work on some iwlwifi devices, etc.
> 
> The core does know how to configure all the ASPM features defined in
> the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.
> 
>> In some of the other ASICs, this programming is done in VBIOS/SBIOS
>> firmware. Having it in driver provides the advantage of additional
>> tuning without forcing a VBIOS upgrade.
> 
> I think it's clearly the intent of the PCIe spec that ASPM
> configuration be done by generic code.  Here are some things that
> require a system-level view, not just an individual device view:
> 
>    - L0s, L1, and L1 Substates cannot be enabled unless both ends
>      support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).
> 
>    - Devices advertise the "Acceptable Latency" they can accept for
>      transitions from L0s or L1 to L0, and the actual latency depends
>      on the "Exit Latencies" of all the devices in the path to the Root
>      Port (sec 5.4.1.3.2).
> 
>    - LTR (required by L1.2) cannot be enabled unless it is already
>      enabled in all upstream devices (sec 6.18).  This patch relies on
>      "ltr_path", which works now but relies on the PCI core never
>      reconfiguring the upstream path.
> 
> There might be amdgpu-specific features the driver needs to set up,
> but if drivers fiddle with architected features like LTR behind the
> PCI core's back, things are likely to break.
> 

The programming is mostly related to entry conditions and spec leaves it 
to implementation.

 From r4.0 spec -
"
This specification does not dictate when a component with an Upstream 
Port must initiate a transition to the L1 state. The interoperable 
mechanisms for transitioning into and out of L1 are defined within this 
specification; however, the specific ASPM policy governing when to 
transition into L1 is left to the implementer.
...
Another approach would be for the Downstream device to initiate a 
transition to the L1 state once the Link has been idle in L0 for a set 
amount of time.
"

Some of the programming like below relates to timings for entry.

         def = data = RREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3);
         data |= 0x5DE0 << 
RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT;
         data |= 0x0010 << 
RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT;
         if (def != data)
                 WREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3, data);

Similarly for LTR, as it provides a dynamic mechanism to report 
tolerance while in L1 substates, the tolerance timings can be tuned 
through registers though there is a threshold.

Regardless, Alex is already checking with hardware design team on 
possible improvements.

Thanks,
Lijo

>> From: Alex Deucher <alexdeucher at gmail.com>
>> On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> 
>>> Do you know why the driver configures ASPM itself?  If the PCI core is
>>> doing something wrong (and I'm sure it is, ASPM support is kind of a
>>> mess), I'd much prefer to fix up the core where *all* drivers can
>>> benefit from it.
>>
>> This is the programming sequence we get from our hardware team and it
>> is used on both windows and Linux.  As far as I understand it windows
>> doesn't handle this in the core, it's up to the individual drivers to
>> enable it.  I'm not familiar with how this should be enabled
>> generically, but at least for our hardware, it seems to have some
>> variation compared to what is done in the PCI core due to stability,
>> etc. It seems to me that this may need asic specific implementations
>> for a lot of hardware depending on the required programming sequences.
>> E.g., various asics may need hardware workaround for bugs or platform
>> issues, etc.  I can ask for more details from our hardware team.
> 
> If the PCI core has stability issues, I want to fix them.  This
> hardware may have its own stability issues, and I would ideally like
> to have drivers use interfaces like pci_disable_link_state() to avoid
> broken things.  Maybe we need new interfaces for more subtle kinds of
> breakage.
> 
> Bjorn
> 


More information about the amd-gfx mailing list