[PATCH] pci: fix incorrect value returned from pcie_get_speed_cap

Alex Deucher alexdeucher at gmail.com
Tue Nov 20 14:17:52 UTC 2018


On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote:
> > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask
> > the register and compare it against them.
> >
> > This patch fixes errors "amdgpu: [powerplay] failed to send message 261
> > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because
> > the slot is being incorrectly reported as PCIe-v3 capable.
> >
> > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
> > Cc: stable at vger.kernel.org    # v4.17+
> >
> > ---
> >  drivers/pci/pci.c |    8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > Index: linux-4.19/drivers/pci/pci.c
> > ===================================================================
> > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100
> > +++ linux-4.19/drivers/pci/pci.c      2018-10-30 16:58:58.000000000 +0100
> > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st
> >
> >       pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> >       if (lnkcap) {
> > -             if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> > +             if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB)
> >                       return PCIE_SPEED_16_0GT;
> > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB)
> >                       return PCIE_SPEED_8_0GT;
> > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB)
> >                       return PCIE_SPEED_5_0GT;
> > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
> >                       return PCIE_SPEED_2_5GT;
> >       }
> >
>
> I'd like to apply this as below, where I removed the 8_0GB and 16_0GB
> cases as recommended by the spec.  I can't test it myself, and the
> bugzillas don't contain enough information for me to confirm that the
> patch below is enough (the "lspci -vv" output of the root port and GPU
> is what I would need).
>
> I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but
> v4.18 works fine according to both bugzillas.

This issue affects AMD GPUs because we switched from using an open
coded check for pcie link speeds in the driver to using the common
pcie variants in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f

The patch below would regress performance, at least on AMD GPUs, since
we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause
the driver to limit the speed to gen2 even if gen3 or 4 are available.

>
> I also don't have a good feel for whether this is urgent enough to be
> a v4.20 fix or whether it can wait for v4.21.  Evidence either way
> would help.

I'd like it to land for 4.19 and 4.20 at least.  Alternatively, we
could revert all of the drm patches to and bring back all the open
coded implementations, but it's a fairly large number of patches to
revert.

Alex

>
> We also need similar fixes in pci_set_bus_speed(), pcie_speeds()
> (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(),
> qla24xx_pci_info_str(), and maybe a couple other places.
>
> Bjorn
>
>
> commit 871f73abf4b8e6aee8a206775f944ede7c7d7250
> Author: Mikulas Patocka <mpatocka at redhat.com>
> Date:   Tue Oct 30 12:36:08 2018 -0400
>
>     PCI: Fix incorrect value returned from pcie_get_speed_cap()
>
>     The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
>     the register and compare it against them.
>
>     This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret
>     is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the
>     slot is being incorrectly reported as PCIe-v3 capable.
>
>     Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
>     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
>     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
>     Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
>     [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
>     PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2]
>     Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>     Acked-by: Alex Deucher <alexander.deucher at amd.com>
>     Cc: stable at vger.kernel.org      # v4.17+
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d068f11d08a7..8563d1d9b102 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
>         u32 lnkcap2, lnkcap;
>
>         /*
> -        * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link
> -        * Speeds Vector in Link Capabilities 2 when supported, falling
> -        * back to Max Link Speed in Link Capabilities otherwise.
> +        * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18.  The
> +        * implementation note there recommends using the Supported Link
> +        * Speeds Vector in Link Capabilities 2 when supported.
> +        *
> +        * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software
> +        * should use the Supported Link Speeds field in Link Capabilities,
> +        * where only 2.5 GT/s and 5.0 GT/s speeds were defined.
>          */
>         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
>         if (lnkcap2) { /* PCIe r3.0-compliant */
> @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
>
>         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
>         if (lnkcap) {
> -               if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> -                       return PCIE_SPEED_16_0GT;
> -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> -                       return PCIE_SPEED_8_0GT;
> -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> +               if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
>                         return PCIE_SPEED_5_0GT;
> -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> +               else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
>                         return PCIE_SPEED_2_5GT;
>         }
>


More information about the amd-gfx mailing list