[Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

Mon Jun 24 15:04:40 UTC 2019

Hi Bjorn,

I managed to trigger the issue from userspace without having to call
into any ACPI code. Wrote a little script[1] which essentially does
(and prints) this (I've added some comments):

PCI SCAN
# set PCIe link speed to 2.5. Bug doesn't trigger with 5.0 or 8.0
# NVA is a small lib we use to poke into MMIO registers of the GPU in
order to easier reverse engineer them.
NVA R 0x8c040 80005800
NVA W 0x8c040 80085800
NVA R 0x8c040 80085800
NVA W 0x8c040 80085801
NVA R 0x8c040 80085800
# put GPU into d3hot via the PCI config
PCI R 0000:01:00.0 0x64 00000008
PCI W 0000:01:00.0 0x64 0000000b
PCI R 0000:01:00.0 0x64 0000000b
# slot ctrl/status
PCI R 0000:00:01.0 0xb8 00480000
# link ctrl/status 2
PCI R 0000:00:01.0 0xd0 001e0043
# bit 7 in 0x248 is called Q0L2 inside the ACPI code
# is set by the ACPI platform method to runtime suspend the GPU
# \_SB.PCI0.PEG0.PG00._OFF on my machine for the parent method
PCI W 0000:00:01.0 0x248 00000080
PCI R 0000:00:01.0 0x248 00000000
# now things are super messed up <====!!
# slot ctrl/status:
# bit 6 got cleared which indicates the bus doesn't detect any device
connected to the slot
# bit stays set if the PCIe link speed was set to 5.0 or 8.0
PCI R 0000:00:01.0 0xb8 00080000
# link ctrl/status 2
# apparently equalization wasn't executed, but maybe that makes sense
if the bridge controller doesn't see any devices
PCI R 0000:00:01.0 0xd0 00000043
# reading from the GPU now fails
PCI R 0000:01:00.0 0x64 ffffffff
GPU PCI link in broken state. GPU might not come back on PCI bus rescan.
PCI REMOVE 0000:01:00.0
PCI SCAN
# the GPU wasn't able to be found after a rescan
setpci: Warning: No devices selected for "0x64.L".
PCI R 0000:01:00.0 0x64

I think Q0L2 should set the Link into L2 state, but I have no idea
what implications that has. Also the write to Q0L2 is guarded by some
conditions inside the ACPI code, but I don't know how much we can
trust vendors to have those guards and what they do at all. At this
point I am inclined to say that this might actually be a hardware or
firmware bug or something.

At this point I guess it makes sense to poke Intel folks, but I
wouldn't know who exactly to ask here. Any thoughts or ideas?

Sadly I don't know the PCIe spec enough (to be honest, I just read a
bit in the spec yesterday) to say what this Q0L2 field is actually
doing. That register also doesn't seem to be documented by Intel afaik
[2].

[1]: script: https://gist.githubusercontent.com/karolherbst/0d0c369a0c14dc092fcb0f5c854dd79c/raw/daab0c3a3d0aa94c270eba44b811271042dd2756/script.sh
[2]: intel KBL platform datasheet:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-gen-core-family-mobile-h-processor-lines-datasheet-vol-2.pdf

On Wed, Jun 19, 2019 at 2:12 PM Karol Herbst <kherbst at redhat.com> wrote:
>
> ohh nvm. It was a mistake on my end. Sorry for the noise
>
> On Wed, Jun 19, 2019 at 2:07 PM Karol Herbst <kherbst at redhat.com> wrote:
> >
> > Hi Bjorn,
> >
> > I was playing around with some older information again (write into the
> > PCI config to put the card into d3 state). And there is something
> > which made me very curious:
> > If I put the card manually into any other state besides D0 via the
> > 0x64 pci config register, the card just dies and pci core seems to
> > expect this to not happen. pci_raw_set_power_state has this
> > "pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
> > and reads the value back later, but if the card is already gone, maybe
> > we can't do this for nvidia GPUs?
> >
> > No idea why I didn't played around more with that register, but if the
> > card already dies there then this kind of shows there is indeed an
> > issue on a PCI level, no?
> >
> > On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > >
> > > On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > > > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > > > this issue inside Nouveau and leave it there until we have a better
> > > > understanding or non Nouveau cases of this issue.
> > >
> > > Nope, I have no more ideas.
> > >
> > > > On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst at redhat.com> wrote:
> > > > >
> > > > > doing the same on the bridge controller with my workarounds applied:
> > > > >
> > > > > please note some differences:
> > > > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > > > SltSta: PresDet+ vs PresDet-
> > > > > LnkSta2: Equalization stuff
> > > > > Virtual channel: NegoPending- vs NegoPending+
> > > > >
> > > > > both times I executed lspci while the GPU was still suspended.
> > > > >
> > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > decode])
> > > > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > >         Latency: 0
> > > > >         Interrupt: pin A routed to IRQ 16
> > > > >         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > >         I/O behind bridge: 0000e000-0000efff [size=4K]
> > > > >         Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > > >         Prefetchable memory behind bridge:
> > > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > > >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > > >         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > > >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > >         Capabilities: [88] Subsystem: Dell Device 07be
> > > > >         Capabilities: [80] Power Management version 3
> > > > >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > >         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > >                 Address: 00000000  Data: 0000
> > > > >         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > >                         ExtTag- RBE+
> > > > >                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > >                         MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > AuxPwr- TransPend-
> > > > >                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > > Exit Latency L0s <256ns, L1 <8us
> > > > >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > >                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > > > >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > > >                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > > HotPlug- Surprise-
> > > > >                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > >                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > > > CmdCplt- HPIrq- LinkChg-
> > > > >                         Control: AttnInd Unknown, PwrInd Unknown,
> > > > > Power- Interlock-
> > > > >                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > > PresDet+ Interlock-
> > > > >                         Changed: MRL- PresDet+ LinkState-
> > > > >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > > PMEIntEna- CRSVisible-
> > > > >                 RootCap: CRSVisible-
> > > > >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > > >                 DevCap2: Completion Timeout: Not Supported,
> > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > > >                 DevCtl2: Completion Timeout: 50us to 50ms,
> > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                          AtomicOpsCtl: ReqEn- EgressBlck-
> > > > >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > > >                          Transmit Margin: Normal Operating Range,
> > > > > EnterModifiedCompliance- ComplianceSOS-
> > > > >                          Compliance De-emphasis: -6dB
> > > > >                 LnkSta2: Current De-emphasis Level: -6dB,
> > > > > EqualizationComplete+, EqualizationPhase1+
> > > > >                          EqualizationPhase2+, EqualizationPhase3+,
> > > > > LinkEqualizationRequest-
> > > > >         Capabilities: [100 v1] Virtual Channel
> > > > >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > > >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> > > > >                 Ctrl:   ArbSelect=Fixed
> > > > >                 Status: InProgress-
> > > > >                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > > >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > > >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > > >                         Status: NegoPending- InProgress-
> > > > >         Capabilities: [140 v1] Root Complex Link
> > > > >                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > > >                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > > >                         Addr:   00000000fed19000
> > > > >         Capabilities: [d94 v1] Secondary PCI Express <?>
> > > > >         Kernel driver in use: pcieport
> > > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> > > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
> > > > >
> > > > > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst at redhat.com> wrote:
> > > > > >
> > > > > > was able to get the lspci prints via ssh. Machine rebooted
> > > > > > automatically each time though.
> > > > > >
> > > > > > relevant dmesg:
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > > > > >
> > > > > > (last one is a 64 bit mmio read to get the on GPU timer value)
> > > > > >
> > > > > > # lspci -vvxxx -s 0:01.00
> > > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > > decode])
> > > > > >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > > >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > >        Latency: 0
> > > > > >        Interrupt: pin A routed to IRQ 16
> > > > > >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > > >        I/O behind bridge: 0000e000-0000efff [size=4K]
> > > > > >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > > > >        Prefetchable memory behind bridge:
> > > > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > > > >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > > > >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > > > >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > > >        Capabilities: [88] Subsystem: Dell Device 07be
> > > > > >        Capabilities: [80] Power Management version 3
> > > > > >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > > >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > > >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > > >                Address: 00000000  Data: 0000
> > > > > >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > > >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > > >                        ExtTag- RBE+
> > > > > >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > > >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > > >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > > AuxPwr- TransPend-
> > > > > >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > > > Exit Latency L0s <256ns, L1 <8us
> > > > > >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > > >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > > >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > > >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> > > > > >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > > > >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > > > HotPlug- Surprise-
> > > > > >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > > >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > > > > > HPIrq- LinkChg-
> > > > > >                        Control: AttnInd Unknown, PwrInd Unknown,
> > > > > > Power- Interlock-
> > > > > >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > > > PresDet- Interlock-
> > > > > >                        Changed: MRL- PresDet+ LinkState-
> > > > > >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > > > PMEIntEna- CRSVisible-
> > > > > >                RootCap: CRSVisible-
> > > > > >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > > > >                DevCap2: Completion Timeout: Not Supported,
> > > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > > >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > > > >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > > > > > LTR+, OBFF Via WAKE# ARIFwd-
> > > > > >                         AtomicOpsCtl: ReqEn- EgressBlck-
> > > > > >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > > > >                         Transmit Margin: Normal Operating Range,
> > > > > > EnterModifiedCompliance- ComplianceSOS-
> > > > > >                         Compliance De-emphasis: -6dB
> > > > > >                LnkSta2: Current De-emphasis Level: -6dB,
> > > > > > EqualizationComplete-, EqualizationPhase1-
> > > > > >                         EqualizationPhase2-, EqualizationPhase3-,
> > > > > > LinkEqualizationRequest-
> > > > > >        Capabilities: [100 v1] Virtual Channel
> > > > > >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > > > >                Arb:    Fixed- WRR32- WRR64- WRR128-
> > > > > >                Ctrl:   ArbSelect=Fixed
> > > > > >                Status: InProgress-
> > > > > >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > > > >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > > > >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > > > >                        Status: NegoPending+ InProgress-
> > > > > >        Capabilities: [140 v1] Root Complex Link
> > > > > >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > > > >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > > > >                        Addr:   00000000fed19000
> > > > > >        Capabilities: [d94 v1] Secondary PCI Express <?>
> > > > > >        Kernel driver in use: pcieport
> > > > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > > > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> > > > > >
> > > > > > lspci -vvxxx -s 1:00.00
> > > > > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > > > > > Mobile] (rev ff) (prog-if ff)
> > > > > >        !!! Unknown header type 7f
> > > > > >        Kernel driver in use: nouveau
> > > > > >        Kernel modules: nouveau
> > > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > >
> > > > > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst at redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > > > > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
> > > > > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > > > > > that outside of drivers?
> > > > > > > > > > > >
> > > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > > > > > feasible.
> > > > > > > > > > > >
> > > > > > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > > > > > combination doesn't work.
> > > > > > > > > > > >
> > > > > > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > > > > > something?
> > > > > > > > > > > >
> > > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > > > > > >
> > > > > > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > > > > > >   or operating system software is involved.
> > > > > > > > > > > >
> > > > > > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > > > > > speed.
> > > > > > > > > > >
> > > > > > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > > > > > something?
> > > > > > > > > >
> > > > > > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > > > > > the OS.
> > > > > > > > > >
> > > > > > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > > > > > very little control over that state.  It can't directly restore the
> > > > > > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > > > > > result in reliable operation.
> > > > > > > > > >
> > > > > > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > > > > > states on GPU.
> > > > > > > > > >
> > > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > > > > > that, it would be via some device-specific mechanism.
> > > > > > > > > >
> > > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > > > > > actually restores the state which lets the controller fail to
> > > > > > > > > > > establish communication again?
> > > > > > > > > >
> > > > > > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > > > > > >      without OS/driver intervention.
> > > > > > > > > >
> > > > > > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > > > > > >      requires driver intervention or at least some ACPI method.
> > > > > > > > >
> > > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > > > > > point of view we don't care about the link before or after that script
> > > > > > > > > was invoked. Also there is no ACPI method involved.
> > > > > > > > >
> > > > > > > > > But if there is something we should notify pci core about, maybe
> > > > > > > > > that's something we have to do then?
> > > > > > > >
> > > > > > > > I don't think there's anything the PCI core could do with that
> > > > > > > > information anyway.  The PCI core doesn't care at all about the link
> > > > > > > > speed, and it really can't influence it directly.
> > > > > > > >
> > > > > > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > > > > > >
> > > > > > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > > > > > >      initial boot.
> > > > > > > > >
> > > > > > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > > > > > when resuming the bridge controller.
> > > > > > > >
> > > > > > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > > > > > the device.  But if you can avoid the issue by running another script
> > > > > > > > on the PMU before suspend, that's probably what you'll have to do.
> > > > > > > >
> > > > > > >
> > > > > > > there is none as far as we know. Or at least nothing inside the vbios.
> > > > > > > We still have to get signed PMU firmware images from Nvidia for full
> > > > > > > support, but this still would be a hacky issue as we would depend on
> > > > > > > those then (and without having those in  redistributable form, there
> > > > > > > isn't much we can do about except fixing it on the kernel side).
> > > > > > >
> > > > > > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > > > > > >      8.0 GT/s.
> > > > > > > > >
> > > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > > > > > >
> > > > > > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > > > > > link speed.
> > > > > > > >
> > > > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > > > > > the link is supposed to work and the device should be reachable
> > > > > > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > > > > > dependency between the GPU and the driver here.
> > > > > > > > >
> > > > > > > > > but the device isn't reachable at all, not even from the pci
> > > > > > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > > > > > >
> > > > > > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > > > > > whether a PCI error occurred.
> > > > > > > >
> > > > > > >
> > > > > > > that's kind of problematic as it might just lock up my machine... but
> > > > > > > let me try that.
> > > > > > >
> > > > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > > > > > things work.  That would make sense to me because then the driver's
> > > > > > > > > > idea of the link state after resume would match the actual state.
> > > > > > > > >
> > > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > > > > > care one bit about the current link speed, so I assume you mean
> > > > > > > > > something inside the pci core code?
> > > > > > > > >
> > > > > > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > > > > > does save and restore most of the architected config space around
> > > > > > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > > > > > the PCI core would have no idea how to save/restore it.
> > > > > > > > >
> > > > > > > > > if we assume that the negotiation on a device level works as intended,
> > > > > > > > > then I would expect this to be a pci core issue, which might actually
> > > > > > > > > be not fixable there. But if it's not, then we would have to put
> > > > > > > > > something like that inside the runpm documentation to tell drivers
> > > > > > > > > they have to do something about it.
> > > > > > > > >lspci -vvxxxx
> > > > > > > > > But again, for me it just sounds like the negotiation on the device
> > > > > > > > > level fails or something inside pci core messes it up.
> > > > > > > >
> > > > > > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > > > > > core has no way to know what that was or how to fix it.
> > > > > > > >
> > > > > > >
> > > > > > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > > > > > the GPU and the host bridge controller, because no matter what the
> > > > > > > state was before, we have to reprobe instead of relying on a known
> > > > > > > state, no?
> > > > > > >
> > > > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst at redhat.com>
> > > > > > > > > > > > > Reviewed-by: Lyude Paul <lyude at redhat.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > > > > > >       } agp;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       struct {
> > > > > > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > > > > > -             u8 width;
> > > > > > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > > > > > +             u8 cur_width;
> > > > > > > > > > > > >       } pcie;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       bool msi;
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > > > > > >       pci->func = func;
> > > > > > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > > > > > >       pci->irq = -1;
> > > > > > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +int
> > > > > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  int
> > > > > > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +int
> > > > > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  int
> > > > > > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > > >               speed = max_speed;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > > > > > >
> > > > > > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > > > > > >  #endif
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.21.0
> > > > > > > > > > > > >