[PATCH] pci: fix incorrect value returned from pcie_get_speed_cap

Bjorn Helgaas helgaas at kernel.org
Tue Nov 20 15:13:03 UTC 2018


On Tue, Nov 20, 2018 at 09:17:52AM -0500, Alex Deucher wrote:
> On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote:
> > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask
> > > the register and compare it against them.
> > >
> > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261
> > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because
> > > the slot is being incorrectly reported as PCIe-v3 capable.
> > >
> > > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
> > > Cc: stable at vger.kernel.org    # v4.17+
> > >
> > > ---
> > >  drivers/pci/pci.c |    8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > Index: linux-4.19/drivers/pci/pci.c
> > > ===================================================================
> > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100
> > > +++ linux-4.19/drivers/pci/pci.c      2018-10-30 16:58:58.000000000 +0100
> > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st
> > >
> > >       pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > >       if (lnkcap) {
> > > -             if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> > > +             if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB)
> > >                       return PCIE_SPEED_16_0GT;
> > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB)
> > >                       return PCIE_SPEED_8_0GT;
> > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB)
> > >                       return PCIE_SPEED_5_0GT;
> > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
> > >                       return PCIE_SPEED_2_5GT;
> > >       }
> > >
> >
> > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB
> > cases as recommended by the spec.  I can't test it myself, and the
> > bugzillas don't contain enough information for me to confirm that the
> > patch below is enough (the "lspci -vv" output of the root port and GPU
> > is what I would need).
> >
> > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but
> > v4.18 works fine according to both bugzillas.
> 
> This issue affects AMD GPUs because we switched from using an open
> coded check for pcie link speeds in the driver to using the common
> pcie variants in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f

OK, thanks.  I added that SHA1 and a note to explain the connection.

> The patch below would regress performance, at least on AMD GPUs, since
> we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause
> the driver to limit the speed to gen2 even if gen3 or 4 are available.

I guess this means these parts are broken with respect to the spec,
since they support gen3 speeds but don't implement LnkCap2?

Gen2, i.e., PCIe r2.1, only defined 2.5 GT/s and 5 GT/s.  Devices
capable of the higher speeds added by PCIe r3.0 are supposed to
implement LnkCap2, but if we're even getting to this code, it means
LnkCap2 was zero.

If we confirm that this is a device defect, the question is what the
best way to work around it is.  Probably the original patch is easier
than some sort of quirk, but we'd need to expand the comment a little
bit to explain why we're not following the spec recommendation.

It looks like lspci probably also needs some updating here -- it
currently doesn't do anything at all with LnkCap2.

> > I also don't have a good feel for whether this is urgent enough to be
> > a v4.20 fix or whether it can wait for v4.21.  Evidence either way
> > would help.
> 
> I'd like it to land for 4.19 and 4.20 at least.  Alternatively, we
> could revert all of the drm patches to and bring back all the open
> coded implementations, but it's a fairly large number of patches to
> revert.

OK, sounds like it makes sense to do this for v4.20 and backport it at
least to v4.19 stable.  I do want to get the places below fixed also.
They may not be as urgent, but we might as well try and make
everything consistent while we're looking at it.

> > We also need similar fixes in pci_set_bus_speed(), pcie_speeds()
> > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(),
> > qla24xx_pci_info_str(), and maybe a couple other places.
> >
> > Bjorn
> >
> >
> > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250
> > Author: Mikulas Patocka <mpatocka at redhat.com>
> > Date:   Tue Oct 30 12:36:08 2018 -0400
> >
> >     PCI: Fix incorrect value returned from pcie_get_speed_cap()
> >
> >     The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
> >     the register and compare it against them.
> >
> >     This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret
> >     is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the
> >     slot is being incorrectly reported as PCIe-v3 capable.
> >
> >     Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
> >     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
> >     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
> >     Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> >     [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
> >     PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2]
> >     Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
> >     Acked-by: Alex Deucher <alexander.deucher at amd.com>
> >     Cc: stable at vger.kernel.org      # v4.17+
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index d068f11d08a7..8563d1d9b102 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
> >         u32 lnkcap2, lnkcap;
> >
> >         /*
> > -        * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link
> > -        * Speeds Vector in Link Capabilities 2 when supported, falling
> > -        * back to Max Link Speed in Link Capabilities otherwise.
> > +        * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18.  The
> > +        * implementation note there recommends using the Supported Link
> > +        * Speeds Vector in Link Capabilities 2 when supported.
> > +        *
> > +        * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software
> > +        * should use the Supported Link Speeds field in Link Capabilities,
> > +        * where only 2.5 GT/s and 5.0 GT/s speeds were defined.
> >          */
> >         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> >         if (lnkcap2) { /* PCIe r3.0-compliant */
> > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
> >
> >         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> >         if (lnkcap) {
> > -               if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> > -                       return PCIE_SPEED_16_0GT;
> > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> > -                       return PCIE_SPEED_8_0GT;
> > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> > +               if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> >                         return PCIE_SPEED_5_0GT;
> > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> > +               else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
> >                         return PCIE_SPEED_2_5GT;
> >         }
> >


More information about the amd-gfx mailing list