Debugging PCIe Configuration Space Using mmiotrace
Masami Hiramatsu (Google)
mhiramat at kernel.org
Wed Mar 19 12:13:58 UTC 2025
Hi,
On Tue, 18 Mar 2025 23:53:34 +0530
Naveen Kumar P <naveenkumar.parna at gmail.com> wrote:
> Hi all,
>
> I am currently debugging an issue on an x86 machine running the latest
> Linux kernel, involving a PCIe device whose memory is mapped via BAR0.
> I am encountering unexpected behavior when reading its PCI
> configuration space using lspci, and I am seeking guidance on whether
> mmiotrace can help diagnose the problem.
AFAIK, mmiotrace is tracing mmio operation from CPU side. That
traces what data the driver is writing where, and what data is read
from where.
>
> Issue Summary:
> Expected Behavior After Boot:
> lspci -xxx -s 01:00.0 correctly displays valid PCI configuration space
> values, including a properly mapped BAR0.
>
> $ sudo lspci -xxx -s 01:00.0 | grep "10:"
> 10: 00 00 40 b0 00 00 00 00 00 00 00 00 00 00 00 00
>
>
> Unexpected Behavior After Uptime:
> After a few days, reading the PCI configuration space (lspci -xxx -s
> 01:00.0) sometimes returns all 0xffs for the entire config space.
> dmesg does not log any relevant errors.
>
Hmm, the below problem seems device side issue (especially 9xffff
means failed to read the PCI bus, IIRC.)
> $ sudo lspci -xxx -s 01:00.0
> 01:00.0 RAM memory: PLDA Device 5555 (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
>
> After Subsequent Reads:
> Re-running lspci -xxx -s 01:00.0 restores non-0xff values, but BAR0
> gets reset to zero.
>
> $ sudo lspci -xxx -s 01:00.0
> 01:00.0 RAM memory: PLDA Device 5555
> 00: 56 15 55 55 00 00 10 00 00 00 00 05 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 10 00 02 00 c2 8f 00 00 10 28 01 00 21 f4 03 00
> 70: 00 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00
> 90: 20 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> This suggests that some function or driver is resetting BAR0 during or
> after a failed config space read.
>
>
> mmiotrace Setup & Results:
> I have enabled mmiotrace and verified it is active:
> # cat /sys/kernel/tracing/available_tracers
> hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
>
> # cat current_tracer
> mmiotrace
>
> However, trace_pipe and trace logs remain empty even after reproducing
> the issue:
>
> # cat trace_pipe
> VERSION 20070824
> PCIDEV 0000 80860f00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 iosf_mbi_pci
> PCIDEV 0010 80860f31 61 b0000000 0 a0000008 0 e081 0 c0002 400000 0
> 10000000 0 8 0 20000 i915
> PCIDEV 0098 80860f23 5b e071 e061 e051 e041 e021 b0b17000 0 8 4 8 4 20
> 800 0 ahci
> PCIDEV 00a0 80860f35 5a b0b00004 0 0 0 0 0 0 10000 0 0 0 0 0 0 xhci_hcd
> PCIDEV 00b8 80860f50 17 b0b16000 b0b15000 0 0 0 0 0 1000 1000 0 0 0 0
> 0 sdhci-pci
> PCIDEV 00d0 80860f18 62 b0900000 b0800000 0 0 0 0 0 100000 100000 0 0
> 0 0 0 mei_txe
> PCIDEV 00d8 80860f04 16 b0b10004 0 0 0 0 0 0 4000 0 0 0 0 0 0 snd_hda_intel
> PCIDEV 00e0 80860f48 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
> PCIDEV 00e2 80860f4c 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
> PCIDEV 00e3 80860f4e 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport
> PCIDEV 00f8 80860f1c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lpc_ich
> PCIDEV 00fb 80860f12 12 b0b14000 0 0 0 e001 0 0 20 0 0 0 20 0 0 i801_smbus
> PCIDEV 0100 15565555 b b0400000 0 0 0 0 0 0 400000 0 0 0 0 0 0
> PCIDEV 0300 80861533 13 b0a00000 0 d001 b0a80000 0 0 0 80000 0 20 4000 0 0 0 igb
Note that once you read the `trace_pipe` file, the trace data is consumed
and erased (technically, it is not ereased but you can not access it anymore.)
>
> cat trace
> # tracer: mmiotrace
> #
> # entries-in-buffer/entries-written: 0/0 #P:1
> #
> # _-----=> irqs-off/BH-disabled
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
Thus after reading `trace_pipe`, the `trace` file must be empty.
If you want to read it multiple times, you need to use `trace` file always.
>
>
> Request for Assistance:
> Can mmiotrace help determine the root cause of why reading the PCI
> configuration space results in all 0xffs?
As I said, this seems device side or bus side issue. mmiotrace may
not directly help, but you can explain what the software does to the
hardware people.
Thank you,
>
> Is there a way to determine what function or driver is clearing BAR0
> when the values are restored?
>
> If mmiotrace is suitable for this, how can I properly capture the
> relevant trace data to analyze this issue?
>
> Any insights or suggestions would be greatly appreciated. Please let
> me know if you
> need more details.
>
> Best regards,
> Naveen
--
Masami Hiramatsu (Google) <mhiramat at kernel.org>
More information about the Nouveau
mailing list