[error] Drm -> amdgpu Unrecoverable Machine Check

Christian König christian.koenig at amd.com
Mon Dec 2 13:41:00 UTC 2019


Hi Yusuf,

> At first, I am using NXP T1042D4RDB-64B which has 256 MB PCIe buffer 
> according to its. PCIe memory range was arranged to 256 MB in .dts 
> file and in U-boot configuration file. Driver was giving error with 
> exit code -12 (OUT_OF_MEMORY). But I was able to reach the linux console.

That is an expected result. 256MB is not enough for the VRAM BAR and the 
doorbell BAR to fit into. But you can still use VGA emulation that way 
if I'm not completely mistaken.

> Then I canged 256 MB to 4GB in .dtsi and U-boot conf file.
How did you do this? Is your memory layout consistent?

See when you just changed one end address you might need to adjust other 
addresses as well.

Regards,
Christian.

Am 02.12.19 um 14:32 schrieb Yusuf Altıparmak:
>
>
>>     I attached my dts file.
>>
>>     System is working fine when GPU is not plugged in.
>>     *
>>     *
>>     *This is the last console log before freeze:*
>>     [drm] amdgpu kernel modesetting enabled.
>>     [drm] initializing kernel modesetting (POLARIS12 0x1002:0x6987
>>     0x1787:0x2389 0x80).
>>     [drm] register mmio base: 0x20200000
>>     fsl-fman-port ffe488000.port fm1-gb0: renamed from eth0
>>     [drm] register mmio size: 262144
>>     [drm] add ip block number 0 <vi_common>
>>     [drm] add ip block number 1 <gmc_v8_0>
>>     [drm] add ip block number 2 <tonga_ih>
>>     [drm] add ip block number 3 <powerplay>
>>     [drm] add ip block number 4 <dm>
>>     [drm] add ip block number 5 <gfx_v8_0>
>>     [drm] add ip block number 6 <sdma_v3_0>
>>     [drm] add ip block number 7 <uvd_v6_0>
>>     [drm] add ip block number 8 <vce_v3_0>
>>     [drm] UVD is enabled in VM mode
>>     [drm] UVD ENC is enabled in VM mode
>>     [drm] VCE enabled in VM mode
>>     ATOM BIOS: 113-ER16BFC-001
>>     [drm] GPU posting now...
>>     Disabling lock debugging due to kernel taint
>>     Machine check in kernel mode.
>>     Caused by (from MCSR=a000): Load Error Report
>>     Guarded Load Error Report
>>     Kernel panic - not syncing: Unrecoverable Machine check
>>     CPU: 1 PID: 2023 Comm: udevd Tainted: G   M    4.19.26+gc0c2141 #1
>>     Call Trace:
>
>>
>>
>>     _______________________________________________
>>     amd-gfx mailing list
>>     amd-gfx at lists.freedesktop.org  <mailto:amd-gfx at lists.freedesktop.org>
>>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx  <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7C9be244504efb4ff9cc0a08d7772c1630%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637108903547899069&sdata=OL4UKczXinKq5FM3Cid5kYcESkHAl0esXfOB83AUPU8%3D&reserved=0>
>
>
>
> Christian König <ckoenig.leichtzumerken at gmail.com 
> <mailto:ckoenig.leichtzumerken at gmail.com>>, 2 Ara 2019 Pzt, 15:28 
> tarihinde şunu yazdı:
>
>     Hi Yusuf,
>
>     Am 02.12.19 um 12:41 schrieb Yusuf Altıparmak:
>>     My embedded board is freezing when I put E9171 on PCIe. What is
>>     the meaning of Unrecoverable Machine Check error about GPU?
>
>     Well see the explanation on Wikipedia for example:
>     https://en.wikipedia.org/wiki/Machine-check_exception
>     <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMachine-check_exception&data=02%7C01%7Cchristian.koenig%40amd.com%7C9be244504efb4ff9cc0a08d7772c1630%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637108903547909070&sdata=mBxI9LnRBO4iu5oyHc9OrrFj0K10SiOf5%2FztzmNedaI%3D&reserved=0>
>
>     In general it means you have messed up something in your hardware
>     configuration.
>
>>     Could PCIe settings in .dts file cause this problem?
>
>     Possible, but rather unlikely. My best guess is that it is some
>     problem with the power supply.
>
>>     If it is, is there any sample PCIe configuration for E9171?
>
>     The E9171 is just a PCIe device, so the dtsi is actually rather
>     uninteresting. What we really need is a full dmesg and maybe lspci
>     output would help as well.
>
>     Regards,
>     Christian.
>
>
>
> Hi Christian,
>
> At first, I am using NXP T1042D4RDB-64B which has 256 MB PCIe buffer 
> according to its. PCIe memory range was arranged to 256 MB in .dts 
> file and in U-boot configuration file. Driver was giving error with 
> exit code -12 (OUT_OF_MEMORY). But I was able to reach the linux console.
>
> [    5.512922] [drm] amdgpu kernel modesetting enabled.
> [    5.517065] [drm] initializing kernel modesetting (POLARIS12 
> 0x1002:0x6987 0x1787:0x2389 0x80).
> [    5.524507] amdgpu 0001:01:00.0: Fatal error during GPU init
> [    5.529296] amdgpu: probe of 0001:01:00.0 failed with error -12
>
> Then I canged 256 MB to 4GB in .dtsi and U-boot conf file. I also 
> changed 64KB I/O size to 1MB . When I do this, I wasn't able to reach 
> the linux console because board was freezing. But driver was 
> successfull at this time. I already mentioned successfull driver 
> console logs up.
>
> *this is lspci -v when GPU is plugged and Memory size is 256 MB.*
>
> root at t1042d4rdb-64b:~# lspci -v
> 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 
> 11) (prog-if 00 [Normal decode])
>         Device tree node: 
> /sys/firmware/devicetree/base/pcie at ffe240000/pcie at 0
>         Flags: bus master, fast devsel, latency 0, IRQ 20
>         Memory at <ignored> (32-bit, non-prefetchable)
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00000000-0000ffff [size=64K]
>         Memory behind bridge: e0000000-efffffff [size=256M]
>         Prefetchable memory behind bridge: None
>         Capabilities: [44] Power Management version 3
>         Capabilities: [4c] Express Root Port (Slot-), MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> 0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 
> 11) (prog-if 00 [Normal decode])
>         Device tree node: 
> /sys/firmware/devicetree/base/pcie at ffe250000/pcie at 0
>         Flags: bus master, fast devsel, latency 0, IRQ 21
>         Memory at <ignored> (32-bit, non-prefetchable)
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00000000-0000ffff [size=64K]
>         Memory behind bridge: e0000000-efffffff [size=256M]
>         Prefetchable memory behind bridge: None
>         Capabilities: [44] Power Management version 3
>         Capabilities: [4c] Express Root Port (Slot-), MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> 0001:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Lexa [Radeon E9171 MCM] (rev 80) (prog-if 00 [VGA controller])
>         Subsystem: Hightech Information System Ltd. Device 2389
>         Flags: fast devsel, IRQ 41
>         Memory at c10000000 (64-bit, prefetchable) [size=256M]
>         Memory at <ignored> (64-bit, prefetchable)
>         I/O ports at 1100 [size=256]
>         Memory at <ignored> (32-bit, non-prefetchable)
>         Expansion ROM at <ignored> [disabled]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [58] Express Legacy Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 
> Len=010 <?>
>         Capabilities: [150] Advanced Error Reporting
>         Capabilities: [200] Resizable BAR <?>
>         Capabilities: [270] Secondary PCI Express <?>
>         Capabilities: [2b0] Address Translation Service (ATS)
>         Capabilities: [2c0] Page Request Interface (PRI)
>         Capabilities: [2d0] Process Address Space ID (PASID)
>         Capabilities: [320] Latency Tolerance Reporting
>         Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
>         Capabilities: [370] L1 PM Substates
>         Kernel modules: amdgpu
>
> 0001:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] 
> Device aae0
>         Subsystem: Hightech Information System Ltd. Device aae0
>         Flags: bus master, fast devsel, latency 0, IRQ 17
>         Memory at <ignored> (64-bit, non-prefetchable)
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [58] Express Legacy Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 
> Len=010 <?>
>         Capabilities: [150] Advanced Error Reporting
>         Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
>
> 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 
> 11) (prog-if 00 [Normal decode])
>         Device tree node: 
> /sys/firmware/devicetree/base/pcie at ffe260000/pcie at 0
>         Flags: bus master, fast devsel, latency 0, IRQ 22
>         Memory at <ignored> (32-bit, non-prefetchable)
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00000000-0000ffff [size=64K]
>         Memory behind bridge: e0000000-efffffff [size=256M]
>         Prefetchable memory behind bridge: None
>         Capabilities: [44] Power Management version 3
>         Capabilities: [4c] Express Root Port (Slot-), MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> 0003:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 
> 11) (prog-if 00 [Normal decode])
>         Device tree node: 
> /sys/firmware/devicetree/base/pcie at ffe270000/pcie at 0
>         Flags: bus master, fast devsel, latency 0, IRQ 23
>         Memory at <ignored> (32-bit, non-prefetchable)
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00000000-0000ffff [size=64K]
>         Memory behind bridge: e0000000-efffffff [size=256M]
>         Prefetchable memory behind bridge: None
>         Capabilities: [44] Power Management version 3
>         Capabilities: [4c] Express Root Port (Slot-), MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> *AND This is PCIe dmesg message when memory range is 256MB. It's also 
> giving same message when memory range is arranged as 4GB;*
>
> PCI host bridge /pcie at ffe240000  ranges:
>  MEM 0x0000000c00000000..0x0000000c0fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8000000..0x0000000ff800ffff -> 0x0000000000000000
> /pcie at ffe240000: PCICSRBAR @ 0xff000000
> setup_pci_atmu: end of DRAM 200000000
> /pcie at ffe240000: Setup 64-bit PCI DMA window
> /pcie at ffe240000: WARNING: Outbound window cfg leaves gaps in memory 
> map. Adjusting the memory map could reduce unnecessary bounce buffering.
> /pcie at ffe240000: DMA window size is 0xe0000000
> Found FSL PCI host bridge at 0x0000000ffe250000. Firmware bus number: 0->1
> PCI host bridge /pcie at ffe250000  ranges:
>  MEM 0x0000000c10000000..0x0000000c1fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8010000..0x0000000ff801ffff -> 0x0000000000000000
> /pcie at ffe250000: PCICSRBAR @ 0xff000000
> setup_pci_atmu: end of DRAM 200000000
> /pcie at ffe250000: Setup 64-bit PCI DMA window
> /pcie at ffe250000: WARNING: Outbound window cfg leaves gaps in memory 
> map. Adjusting the memory map could reduce unnecessary bounce buffering.
> /pcie at ffe250000: DMA window size is 0xe0000000
> Found FSL PCI host bridge at 0x0000000ffe260000. Firmware bus number: 0->0
> PCI host bridge /pcie at ffe260000  ranges:
>  MEM 0x0000000c20000000..0x0000000c2fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8020000..0x0000000ff802ffff -> 0x0000000000000000
> /pcie at ffe260000: PCICSRBAR @ 0xff000000
> setup_pci_atmu: end of DRAM 200000000
> /pcie at ffe260000: Setup 64-bit PCI DMA window
> /pcie at ffe260000: WARNING: Outbound window cfg leaves gaps in memory 
> map. Adjusting the memory map could reduce unnecessary bounce buffering.
> /pcie at ffe260000: DMA window size is 0xe0000000
> Found FSL PCI host bridge at 0x0000000ffe270000. Firmware bus number: 0->0
> PCI host bridge /pcie at ffe270000  ranges:
>  MEM 0x0000000c30000000..0x0000000c3fffffff -> 0x00000000e0000000
>   IO 0x0000000ff8030000..0x0000000ff803ffff -> 0x0000000000000000
> /pcie at ffe270000: PCICSRBAR @ 0xff000000
> setup_pci_atmu: end of DRAM 200000000
> /pcie at ffe270000: Setup 64-bit PCI DMA window
> /pcie at ffe270000: WARNING: Outbound window cfg leaves gaps in memory 
> map. Adjusting the memory map could reduce unnecessary bounce buffering.
> /pcie at ffe270000: DMA window size is 0xe0000000
> iommu: Adding device ff6000000.qman-portal to group 0
> iommu: Adding device ff6004000.qman-portal to group 1
> iommu: Adding device ff6008000.qman-portal to group 2
> iommu: Adding device ff600c000.qman-portal to group 3
> iommu: Adding device ff6010000.qman-portal to group 4
> iommu: Adding device ff6014000.qman-portal to group 5
> iommu: Adding device ff6018000.qman-portal to group 6
> iommu: Adding device ff601c000.qman-portal to group 7
> iommu: Adding device ff6020000.qman-portal to group 8
> iommu: Adding device ff6024000.qman-portal to group 9
> iommu: Adding device ffe100300.dma to group 10
> iommu: Adding device ffe101300.dma to group 11
> iommu: Adding device ffe114000.sdhc to group 12
> iommu: Adding device ffe210000.usb to group 13
> iommu: Adding device ffe211000.usb to group 14
> iommu: Adding device ffe220000.sata to group 15
> iommu: Adding device ffe221000.sata to group 16
> iommu: Adding device ffe318000.qman to group 17
> iommu: Adding device ffe31a000.bman to group 18
> iommu: Adding device ffe240000.pcie to group 19
> iommu: Adding device ffe250000.pcie to group 20
> iommu: Adding device ffe260000.pcie to group 21
> iommu: Adding device ffe270000.pcie to group 22
> iommu: Adding device ffe140000.qe to group 23
> software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
> PCI: Probing PCI hardware
> fsl-pci ffe240000.pcie: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io 
>  0x8000080000010000-0x800008000001ffff] (bus address [0x0000-0xffff])
> pci_bus 0000:00: root bus resource [mem 0xc00000000-0xc0fffffff] (bus 
> address [0xe0000000-0xefffffff])
> pci_bus 0000:00: root bus resource [bus 00]
> iommu: Removing device ffe240000.pcie from group 19
> iommu: Adding device 0000:00:00.0 to group 24
> pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), 
> reconfiguring
> pci 0000:00:00.0: PCI bridge to [bus 01-ff]
> fsl-pci ffe250000.pcie: PCI host bridge to bus 0001:00
> pci_bus 0001:00: root bus resource [io 
>  0x8000080000021000-0x8000080000030fff] (bus address [0x0000-0xffff])
> pci_bus 0001:00: root bus resource [mem 0xc10000000-0xc1fffffff] (bus 
> address [0xe0000000-0xefffffff])
> pci_bus 0001:00: root bus resource [bus 00-01]
> iommu: Removing device ffe250000.pcie from group 20
> iommu: Adding device 0001:00:00.0 to group 19
> pci 0001:01:00.0: enabling Extended Tags
> pci 0001:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 
> GT/s x1 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
> iommu: Adding device 0001:01:00.0 to group 19
> pci 0001:01:00.1: enabling Extended Tags
> iommu: Adding device 0001:01:00.1 to group 19
> pci 0001:00:00.0: PCI bridge to [bus 01-ff]
> fsl-pci ffe260000.pcie: PCI host bridge to bus 0002:00
> pci_bus 0002:00: root bus resource [io 
>  0x8000080000032000-0x8000080000041fff] (bus address [0x0000-0xffff])
> pci_bus 0002:00: root bus resource [mem 0xc20000000-0xc2fffffff] (bus 
> address [0xe0000000-0xefffffff])
> pci_bus 0002:00: root bus resource [bus 00]
> iommu: Removing device ffe260000.pcie from group 21
> iommu: Adding device 0002:00:00.0 to group 20
> pci 0002:00:00.0: bridge configuration invalid ([bus 00-00]), 
> reconfiguring
> pci 0002:00:00.0: PCI bridge to [bus 01-ff]
> fsl-pci ffe270000.pcie: PCI host bridge to bus 0003:00
> pci_bus 0003:00: root bus resource [io 
>  0x8000080000043000-0x8000080000052fff] (bus address [0x0000-0xffff])
> pci_bus 0003:00: root bus resource [mem 0xc30000000-0xc3fffffff] (bus 
> address [0xe0000000-0xefffffff])
> pci_bus 0003:00: root bus resource [bus 00]
> iommu: Removing device ffe270000.pcie from group 22
> iommu: Adding device 0003:00:00.0 to group 21
> pci 0003:00:00.0: bridge configuration invalid ([bus 00-00]), 
> reconfiguring
> pci 0003:00:00.0: PCI bridge to [bus 01-ff]
> PCI: Cannot allocate resource region 0 of device 0000:00:00.0, will remap
> PCI: Cannot allocate resource region 0 of device 0001:00:00.0, will remap
> PCI: Cannot allocate resource region 2 of device 0001:01:00.0, will remap
> PCI: Cannot allocate resource region 5 of device 0001:01:00.0, will remap
> PCI: Cannot allocate resource region 6 of device 0001:01:00.0, will remap
> PCI: Cannot allocate resource region 0 of device 0001:01:00.1, will remap
> PCI: Cannot allocate resource region 0 of device 0002:00:00.0, will remap
> PCI: Cannot allocate resource region 0 of device 0003:00:00.0, will remap
> pci 0000:00:00.0: BAR 0: no space for [mem size 0x01000000]
> pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
> pci 0000:00:00.0: PCI bridge to [bus 01]
> pci 0000:00:00.0:   bridge window [io 
>  0x8000080000010000-0x800008000001ffff]
> pci 0000:00:00.0:   bridge window [mem 0xc00000000-0xc0fffffff]
> pci_bus 0000:00: Some PCI device resources are unassigned, try booting 
> with pci=realloc
> pci 0001:00:00.0: BAR 0: no space for [mem size 0x01000000]
> pci 0001:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
> pci 0001:00:00.0: BAR 9: no space for [mem size 0x00200000 64bit pref]
> pci 0001:00:00.0: BAR 9: failed to assign [mem size 0x00200000 64bit pref]
> pci 0001:01:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
> pci 0001:01:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
> pci 0001:01:00.0: BAR 5: no space for [mem size 0x00040000]
> pci 0001:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
> pci 0001:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
> pci 0001:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
> pci 0001:01:00.1: BAR 0: no space for [mem size 0x00004000 64bit]
> pci 0001:01:00.1: BAR 0: failed to assign [mem size 0x00004000 64bit]
> pci 0001:00:00.0: PCI bridge to [bus 01]
> pci 0001:00:00.0:   bridge window [io 
>  0x8000080000021000-0x8000080000030fff]
> pci 0001:00:00.0:   bridge window [mem 0xc10000000-0xc1fffffff]
> pci_bus 0001:00: Some PCI device resources are unassigned, try booting 
> with pci=realloc
> pci 0002:00:00.0: BAR 0: no space for [mem size 0x01000000]
> pci 0002:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
> pci 0002:00:00.0: PCI bridge to [bus 01]
> pci 0002:00:00.0:   bridge window [io 
>  0x8000080000032000-0x8000080000041fff]
> pci 0002:00:00.0:   bridge window [mem 0xc20000000-0xc2fffffff]
> pci_bus 0002:00: Some PCI device resources are unassigned, try booting 
> with pci=realloc
> pci 0003:00:00.0: BAR 0: no space for [mem size 0x01000000]
> pci 0003:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
> pci 0003:00:00.0: PCI bridge to [bus 01]
> pci 0003:00:00.0:   bridge window [io 
>  0x8000080000043000-0x8000080000052fff]
> pci 0003:00:00.0:   bridge window [mem 0xc30000000-0xc3fffffff]
> pci_bus 0003:00: Some PCI device resources are unassigned, try booting 
> with pci=realloc
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191202/b7cac101/attachment-0001.html>


More information about the amd-gfx mailing list