[PATCH] pci: fix incorrect value returned from pcie_get_speed_cap

Alex Deucher alexdeucher at gmail.com
Tue Nov 20 16:29:49 UTC 2018


On Tue, Nov 20, 2018 at 10:13 AM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Tue, Nov 20, 2018 at 09:17:52AM -0500, Alex Deucher wrote:
> > On Mon, Nov 19, 2018 at 7:47 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > > On Tue, Oct 30, 2018 at 12:36:08PM -0400, Mikulas Patocka wrote:
> > > > The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks. We must mask
> > > > the register and compare it against them.
> > > >
> > > > This patch fixes errors "amdgpu: [powerplay] failed to send message 261
> > > > ret is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because
> > > > the slot is being incorrectly reported as PCIe-v3 capable.
> > > >
> > > > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > > > Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
> > > > Cc: stable at vger.kernel.org    # v4.17+
> > > >
> > > > ---
> > > >  drivers/pci/pci.c |    8 ++++----
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > >
> > > > Index: linux-4.19/drivers/pci/pci.c
> > > > ===================================================================
> > > > --- linux-4.19.orig/drivers/pci/pci.c 2018-10-30 16:58:58.000000000 +0100
> > > > +++ linux-4.19/drivers/pci/pci.c      2018-10-30 16:58:58.000000000 +0100
> > > > @@ -5492,13 +5492,13 @@ enum pci_bus_speed pcie_get_speed_cap(st
> > > >
> > > >       pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > >       if (lnkcap) {
> > > > -             if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> > > > +             if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_16_0GB)
> > > >                       return PCIE_SPEED_16_0GT;
> > > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> > > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_8_0GB)
> > > >                       return PCIE_SPEED_8_0GT;
> > > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> > > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) ==PCI_EXP_LNKCAP_SLS_5_0GB)
> > > >                       return PCIE_SPEED_5_0GT;
> > > > -             else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> > > > +             else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
> > > >                       return PCIE_SPEED_2_5GT;
> > > >       }
> > > >
> > >
> > > I'd like to apply this as below, where I removed the 8_0GB and 16_0GB
> > > cases as recommended by the spec.  I can't test it myself, and the
> > > bugzillas don't contain enough information for me to confirm that the
> > > patch below is enough (the "lspci -vv" output of the root port and GPU
> > > is what I would need).
> > >
> > > I'm confused about the fact that 6cf57be0f78e appeared in v4.17, but
> > > v4.18 works fine according to both bugzillas.
> >
> > This issue affects AMD GPUs because we switched from using an open
> > coded check for pcie link speeds in the driver to using the common
> > pcie variants in
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.19&id=5d9a6330403271fbb1244f14380a7cc44662796f
>
> OK, thanks.  I added that SHA1 and a note to explain the connection.
>
> > The patch below would regress performance, at least on AMD GPUs, since
> > we'd end up reporting a max speed of gen 2 (5 GT/s) which would cause
> > the driver to limit the speed to gen2 even if gen3 or 4 are available.
>
> I guess this means these parts are broken with respect to the spec,
> since they support gen3 speeds but don't implement LnkCap2?

Sorry, I mis-read the patch.  The patch is correct and our parts are
compliant.  We are only hitting this issue on non-gen3 platforms which
would end up in the second case.  Sorry for the confusion.  You new
patch is:
Acked-by: Alex Deucher <alexander.deucher at amd.com>

Alex

>
> Gen2, i.e., PCIe r2.1, only defined 2.5 GT/s and 5 GT/s.  Devices
> capable of the higher speeds added by PCIe r3.0 are supposed to
> implement LnkCap2, but if we're even getting to this code, it means
> LnkCap2 was zero.
>
> If we confirm that this is a device defect, the question is what the
> best way to work around it is.  Probably the original patch is easier
> than some sort of quirk, but we'd need to expand the comment a little
> bit to explain why we're not following the spec recommendation.
>
> It looks like lspci probably also needs some updating here -- it
> currently doesn't do anything at all with LnkCap2.
>
> > > I also don't have a good feel for whether this is urgent enough to be
> > > a v4.20 fix or whether it can wait for v4.21.  Evidence either way
> > > would help.
> >
> > I'd like it to land for 4.19 and 4.20 at least.  Alternatively, we
> > could revert all of the drm patches to and bring back all the open
> > coded implementations, but it's a fairly large number of patches to
> > revert.
>
> OK, sounds like it makes sense to do this for v4.20 and backport it at
> least to v4.19 stable.  I do want to get the places below fixed also.
> They may not be as urgent, but we might as well try and make
> everything consistent while we're looking at it.
>
> > > We also need similar fixes in pci_set_bus_speed(), pcie_speeds()
> > > (hfi1), cobalt_pcie_status_show(), hba_ioctl_callback(),
> > > qla24xx_pci_info_str(), and maybe a couple other places.
> > >
> > > Bjorn
> > >
> > >
> > > commit 871f73abf4b8e6aee8a206775f944ede7c7d7250
> > > Author: Mikulas Patocka <mpatocka at redhat.com>
> > > Date:   Tue Oct 30 12:36:08 2018 -0400
> > >
> > >     PCI: Fix incorrect value returned from pcie_get_speed_cap()
> > >
> > >     The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
> > >     the register and compare it against them.
> > >
> > >     This patch fixes errors "amdgpu: [powerplay] failed to send message 261 ret
> > >     is 0" errors when PCIe-v3 card is plugged into PCIe-v1 slot, because the
> > >     slot is being incorrectly reported as PCIe-v3 capable.
> > >
> > >     Fixes: 6cf57be0f78e ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
> > >     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
> > >     Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
> > >     Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > >     [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
> > >     PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2]
> > >     Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
> > >     Acked-by: Alex Deucher <alexander.deucher at amd.com>
> > >     Cc: stable at vger.kernel.org      # v4.17+
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index d068f11d08a7..8563d1d9b102 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5556,9 +5556,13 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
> > >         u32 lnkcap2, lnkcap;
> > >
> > >         /*
> > > -        * PCIe r4.0 sec 7.5.3.18 recommends using the Supported Link
> > > -        * Speeds Vector in Link Capabilities 2 when supported, falling
> > > -        * back to Max Link Speed in Link Capabilities otherwise.
> > > +        * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18.  The
> > > +        * implementation note there recommends using the Supported Link
> > > +        * Speeds Vector in Link Capabilities 2 when supported.
> > > +        *
> > > +        * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software
> > > +        * should use the Supported Link Speeds field in Link Capabilities,
> > > +        * where only 2.5 GT/s and 5.0 GT/s speeds were defined.
> > >          */
> > >         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
> > >         if (lnkcap2) { /* PCIe r3.0-compliant */
> > > @@ -5575,13 +5579,9 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
> > >
> > >         pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > >         if (lnkcap) {
> > > -               if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
> > > -                       return PCIE_SPEED_16_0GT;
> > > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
> > > -                       return PCIE_SPEED_8_0GT;
> > > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
> > > +               if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
> > >                         return PCIE_SPEED_5_0GT;
> > > -               else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
> > > +               else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
> > >                         return PCIE_SPEED_2_5GT;
> > >         }
> > >


More information about the amd-gfx mailing list