[error] Drm -> amdgpu Unrecoverable Machine Check
Yusuf Altıparmak
yusufalti1997 at gmail.com
Mon Dec 2 16:31:16 UTC 2019
>
> Most likely not. There is support for resizing the VRAM BAR, but usually
> you can only make it larger and not smaller.
> Please give me the output of "sudo setpci -s 0001:01:00.0 ECAP15+4.l
> ECAP15+8.l" if you want to double check that.
>
Okay I'll try it tomorrow. What does the " sudo setpci -s 0001:01:00.0
ECAP15+4.l ECAP15+8.l" command exactly do ?
> Well you rather need to ask if anybody has sample PCIe configuration for
> GPUs in general. That problem is not really E9171 related. You might want
> to ask NXP for that maybe.
> Sorry, no idea if that is correct or not. You need to ask NXP for help
> with that.
>
>
Okay no problem. At least I know what is the missing point now. The problem
is probably because of the .dtsi and u-boot config files. Memory ranges are
overwriting like you said. I'll ask nxp to take some sample PCIe
configuration for GPUs.
Thank you for your interest Christian.
Regards .
>
> Am 02.12.19 um 14:32 schrieb Yusuf Altıparmak:
>>
>>
>>> I attached my dts file.
>>>
>>> System is working fine when GPU is not plugged in.
>>>
>>> *This is the last console log before freeze:*
>>> [drm] amdgpu kernel modesetting enabled.
>>>
>>> [drm] initializing kernel modesetting (POLARIS12 0x1002:0x6987
>>> 0x1787:0x2389 0x80).
>>> [drm] register mmio base: 0x20200000
>>>
>>> fsl-fman-port ffe488000.port fm1-gb0: renamed from eth0
>>>
>>> [drm] register mmio size: 262144
>>>
>>> [drm] add ip block number 0 <vi_common>
>>>
>>> [drm] add ip block number 1 <gmc_v8_0>
>>>
>>> [drm] add ip block number 2 <tonga_ih>
>>>
>>> [drm] add ip block number 3 <powerplay>
>>>
>>> [drm] add ip block number 4 <dm>
>>>
>>> [drm] add ip block number 5 <gfx_v8_0>
>>>
>>> [drm] add ip block number 6 <sdma_v3_0>
>>>
>>> [drm] add ip block number 7 <uvd_v6_0>
>>>
>>> [drm] add ip block number 8 <vce_v3_0>
>>>
>>> [drm] UVD is enabled in VM mode
>>>
>>> [drm] UVD ENC is enabled in VM mode
>>>
>>> [drm] VCE enabled in VM mode
>>>
>>> ATOM BIOS: 113-ER16BFC-001
>>>
>>> [drm] GPU posting now...
>>>
>>> Disabling lock debugging due to kernel taint
>>>
>>> Machine check in kernel mode.
>>>
>>> Caused by (from MCSR=a000): Load Error Report
>>>
>>> Guarded Load Error Report
>>>
>>> Kernel panic - not syncing: Unrecoverable Machine check
>>>
>>> CPU: 1 PID: 2023 Comm: udevd Tainted: G M
>>> 4.19.26+gc0c2141 #1
>>> Call Trace:
>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing listamd-gfx at lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7C9be244504efb4ff9cc0a08d7772c1630%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637108903547899069&sdata=OL4UKczXinKq5FM3Cid5kYcESkHAl0esXfOB83AUPU8%3D&reserved=0>
>>>
>>>
>>>
>>
>> Christian König <ckoenig.leichtzumerken at gmail.com>, 2 Ara 2019 Pzt,
>> 15:28 tarihinde şunu yazdı:
>>
>>> Hi Yusuf,
>>>
>>> Am 02.12.19 um 12:41 schrieb Yusuf Altıparmak:
>>>
>>> My embedded board is freezing when I put E9171 on PCIe. What is the
>>> meaning of Unrecoverable Machine Check error about GPU?
>>>
>>>
>>> Well see the explanation on Wikipedia for example:
>>> https://en.wikipedia.org/wiki/Machine-check_exception
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMachine-check_exception&data=02%7C01%7Cchristian.koenig%40amd.com%7C9be244504efb4ff9cc0a08d7772c1630%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637108903547909070&sdata=mBxI9LnRBO4iu5oyHc9OrrFj0K10SiOf5%2FztzmNedaI%3D&reserved=0>
>>>
>>> In general it means you have messed up something in your hardware
>>> configuration.
>>>
>>> Could PCIe settings in .dts file cause this problem?
>>>
>>>
>>> Possible, but rather unlikely. My best guess is that it is some problem
>>> with the power supply.
>>>
>>> If it is, is there any sample PCIe configuration for E9171?
>>>
>>>
>>> The E9171 is just a PCIe device, so the dtsi is actually rather
>>> uninteresting. What we really need is a full dmesg and maybe lspci output
>>> would help as well.
>>>
>>> Regards,
>>> Christian.
>>>
>>
>>
>> Hi Christian,
>>
>> At first, I am using NXP T1042D4RDB-64B which has 256 MB PCIe buffer
>> according to its. PCIe memory range was arranged to 256 MB in .dts file and
>> in U-boot configuration file. Driver was giving error with exit code -12
>> (OUT_OF_MEMORY). But I was able to reach the linux console.
>>
>> [ 5.512922] [drm] amdgpu kernel modesetting enabled.
>> [ 5.517065] [drm] initializing kernel modesetting (POLARIS12
>> 0x1002:0x6987 0x1787:0x2389 0x80).
>> [ 5.524507] amdgpu 0001:01:00.0: Fatal error during GPU init
>> [ 5.529296] amdgpu: probe of 0001:01:00.0 failed with error -12
>>
>> Then I canged 256 MB to 4GB in .dtsi and U-boot conf file. I also changed
>> 64KB I/O size to 1MB . When I do this, I wasn't able to reach the linux
>> console because board was freezing. But driver was successfull at this
>> time. I already mentioned successfull driver console logs up.
>>
>> *this is lspci -v when GPU is plugged and Memory size is 256 MB.*
>>
>> root at t1042d4rdb-64b:~# lspci -v
>> 0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11)
>> (prog-if 00 [Normal decode])
>> Device tree node: /sys/firmware/devicetree/base/pcie at ffe240000
>> /pcie at 0
>> Flags: bus master, fast devsel, latency 0, IRQ 20
>> Memory at <ignored> (32-bit, non-prefetchable)
>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>> I/O behind bridge: 00000000-0000ffff [size=64K]
>> Memory behind bridge: e0000000-efffffff [size=256M]
>> Prefetchable memory behind bridge: None
>> Capabilities: [44] Power Management version 3
>> Capabilities: [4c] Express Root Port (Slot-), MSI 00
>> Capabilities: [100] Advanced Error Reporting
>> Kernel driver in use: pcieport
>>
>> 0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11)
>> (prog-if 00 [Normal decode])
>> Device tree node: /sys/firmware/devicetree/base/pcie at ffe250000
>> /pcie at 0
>> Flags: bus master, fast devsel, latency 0, IRQ 21
>> Memory at <ignored> (32-bit, non-prefetchable)
>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>> I/O behind bridge: 00000000-0000ffff [size=64K]
>> Memory behind bridge: e0000000-efffffff [size=256M]
>> Prefetchable memory behind bridge: None
>> Capabilities: [44] Power Management version 3
>> Capabilities: [4c] Express Root Port (Slot-), MSI 00
>> Capabilities: [100] Advanced Error Reporting
>> Kernel driver in use: pcieport
>>
>> 0001:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Lexa [Radeon E9171 MCM] (rev 80) (prog-if 00 [VGA controller])
>> Subsystem: Hightech Information System Ltd. Device 2389
>> Flags: fast devsel, IRQ 41
>> Memory at c10000000 (64-bit, prefetchable) [size=256M]
>> Memory at <ignored> (64-bit, prefetchable)
>> I/O ports at 1100 [size=256]
>> Memory at <ignored> (32-bit, non-prefetchable)
>> Expansion ROM at <ignored> [disabled]
>> Capabilities: [48] Vendor Specific Information: Len=08 <?>
>> Capabilities: [50] Power Management version 3
>> Capabilities: [58] Express Legacy Endpoint, MSI 00
>> Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>> Capabilities: [150] Advanced Error Reporting
>> Capabilities: [200] Resizable BAR <?>
>> Capabilities: [270] Secondary PCI Express <?>
>> Capabilities: [2b0] Address Translation Service (ATS)
>> Capabilities: [2c0] Page Request Interface (PRI)
>> Capabilities: [2d0] Process Address Space ID (PASID)
>> Capabilities: [320] Latency Tolerance Reporting
>> Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
>> Capabilities: [370] L1 PM Substates
>> Kernel modules: amdgpu
>>
>> 0001:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device
>> aae0
>> Subsystem: Hightech Information System Ltd. Device aae0
>> Flags: bus master, fast devsel, latency 0, IRQ 17
>> Memory at <ignored> (64-bit, non-prefetchable)
>> Capabilities: [48] Vendor Specific Information: Len=08 <?>
>> Capabilities: [50] Power Management version 3
>> Capabilities: [58] Express Legacy Endpoint, MSI 00
>> Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>> Capabilities: [150] Advanced Error Reporting
>> Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
>>
>> 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11)
>> (prog-if 00 [Normal decode])
>> Device tree node: /sys/firmware/devicetree/base/pcie at ffe260000
>> /pcie at 0
>> Flags: bus master, fast devsel, latency 0, IRQ 22
>> Memory at <ignored> (32-bit, non-prefetchable)
>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>> I/O behind bridge: 00000000-0000ffff [size=64K]
>> Memory behind bridge: e0000000-efffffff [size=256M]
>> Prefetchable memory behind bridge: None
>> Capabilities: [44] Power Management version 3
>> Capabilities: [4c] Express Root Port (Slot-), MSI 00
>> Capabilities: [100] Advanced Error Reporting
>> Kernel driver in use: pcieport
>>
>> 0003:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11)
>> (prog-if 00 [Normal decode])
>> Device tree node: /sys/firmware/devicetree/base/pcie at ffe270000
>> /pcie at 0
>> Flags: bus master, fast devsel, latency 0, IRQ 23
>> Memory at <ignored> (32-bit, non-prefetchable)
>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>> I/O behind bridge: 00000000-0000ffff [size=64K]
>> Memory behind bridge: e0000000-efffffff [size=256M]
>> Prefetchable memory behind bridge: None
>> Capabilities: [44] Power Management version 3
>> Capabilities: [4c] Express Root Port (Slot-), MSI 00
>> Capabilities: [100] Advanced Error Reporting
>> Kernel driver in use: pcieport
>>
>> *AND This is PCIe dmesg message when memory range is 256MB. It's also
>> giving same message when memory range is arranged as 4GB;*
>>
>> PCI host bridge /pcie at ffe240000 ranges:
>> MEM 0x0000000c00000000..0x0000000c0fffffff -> 0x00000000e0000000
>> IO 0x0000000ff8000000..0x0000000ff800ffff -> 0x0000000000000000
>> /pcie at ffe240000: PCICSRBAR @ 0xff000000
>> setup_pci_atmu: end of DRAM 200000000
>> /pcie at ffe240000: Setup 64-bit PCI DMA window
>> /pcie at ffe240000: WARNING: Outbound window cfg leaves gaps in memory map.
>> Adjusting the memory map could reduce unnecessary bounce buffering.
>> /pcie at ffe240000: DMA window size is 0xe0000000
>> Found FSL PCI host bridge at 0x0000000ffe250000. Firmware bus number: 0->1
>> PCI host bridge /pcie at ffe250000 ranges:
>> MEM 0x0000000c10000000..0x0000000c1fffffff -> 0x00000000e0000000
>> IO 0x0000000ff8010000..0x0000000ff801ffff -> 0x0000000000000000
>> /pcie at ffe250000: PCICSRBAR @ 0xff000000
>> setup_pci_atmu: end of DRAM 200000000
>> /pcie at ffe250000: Setup 64-bit PCI DMA window
>> /pcie at ffe250000: WARNING: Outbound window cfg leaves gaps in memory map.
>> Adjusting the memory map could reduce unnecessary bounce buffering.
>> /pcie at ffe250000: DMA window size is 0xe0000000
>> Found FSL PCI host bridge at 0x0000000ffe260000. Firmware bus number: 0->0
>> PCI host bridge /pcie at ffe260000 ranges:
>> MEM 0x0000000c20000000..0x0000000c2fffffff -> 0x00000000e0000000
>> IO 0x0000000ff8020000..0x0000000ff802ffff -> 0x0000000000000000
>> /pcie at ffe260000: PCICSRBAR @ 0xff000000
>> setup_pci_atmu: end of DRAM 200000000
>> /pcie at ffe260000: Setup 64-bit PCI DMA window
>> /pcie at ffe260000: WARNING: Outbound window cfg leaves gaps in memory map.
>> Adjusting the memory map could reduce unnecessary bounce buffering.
>> /pcie at ffe260000: DMA window size is 0xe0000000
>> Found FSL PCI host bridge at 0x0000000ffe270000. Firmware bus number: 0->0
>> PCI host bridge /pcie at ffe270000 ranges:
>> MEM 0x0000000c30000000..0x0000000c3fffffff -> 0x00000000e0000000
>> IO 0x0000000ff8030000..0x0000000ff803ffff -> 0x0000000000000000
>> /pcie at ffe270000: PCICSRBAR @ 0xff000000
>> setup_pci_atmu: end of DRAM 200000000
>> /pcie at ffe270000: Setup 64-bit PCI DMA window
>> /pcie at ffe270000: WARNING: Outbound window cfg leaves gaps in memory map.
>> Adjusting the memory map could reduce unnecessary bounce buffering.
>> /pcie at ffe270000: DMA window size is 0xe0000000
>> iommu: Adding device ff6000000.qman-portal to group 0
>> iommu: Adding device ff6004000.qman-portal to group 1
>> iommu: Adding device ff6008000.qman-portal to group 2
>> iommu: Adding device ff600c000.qman-portal to group 3
>> iommu: Adding device ff6010000.qman-portal to group 4
>> iommu: Adding device ff6014000.qman-portal to group 5
>> iommu: Adding device ff6018000.qman-portal to group 6
>> iommu: Adding device ff601c000.qman-portal to group 7
>> iommu: Adding device ff6020000.qman-portal to group 8
>> iommu: Adding device ff6024000.qman-portal to group 9
>> iommu: Adding device ffe100300.dma to group 10
>> iommu: Adding device ffe101300.dma to group 11
>> iommu: Adding device ffe114000.sdhc to group 12
>> iommu: Adding device ffe210000.usb to group 13
>> iommu: Adding device ffe211000.usb to group 14
>> iommu: Adding device ffe220000.sata to group 15
>> iommu: Adding device ffe221000.sata to group 16
>> iommu: Adding device ffe318000.qman to group 17
>> iommu: Adding device ffe31a000.bman to group 18
>> iommu: Adding device ffe240000.pcie to group 19
>> iommu: Adding device ffe250000.pcie to group 20
>> iommu: Adding device ffe260000.pcie to group 21
>> iommu: Adding device ffe270000.pcie to group 22
>> iommu: Adding device ffe140000.qe to group 23
>> software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
>> PCI: Probing PCI hardware
>> fsl-pci ffe240000.pcie: PCI host bridge to bus 0000:00
>> pci_bus 0000:00: root bus resource [io
>> 0x8000080000010000-0x800008000001ffff] (bus address [0x0000-0xffff])
>> pci_bus 0000:00: root bus resource [mem 0xc00000000-0xc0fffffff] (bus
>> address [0xe0000000-0xefffffff])
>> pci_bus 0000:00: root bus resource [bus 00]
>> iommu: Removing device ffe240000.pcie from group 19
>> iommu: Adding device 0000:00:00.0 to group 24
>> pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]),
>> reconfiguring
>> pci 0000:00:00.0: PCI bridge to [bus 01-ff]
>> fsl-pci ffe250000.pcie: PCI host bridge to bus 0001:00
>> pci_bus 0001:00: root bus resource [io
>> 0x8000080000021000-0x8000080000030fff] (bus address [0x0000-0xffff])
>> pci_bus 0001:00: root bus resource [mem 0xc10000000-0xc1fffffff] (bus
>> address [0xe0000000-0xefffffff])
>> pci_bus 0001:00: root bus resource [bus 00-01]
>> iommu: Removing device ffe250000.pcie from group 20
>> iommu: Adding device 0001:00:00.0 to group 19
>> pci 0001:01:00.0: enabling Extended Tags
>> pci 0001:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s
>> x1 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
>> iommu: Adding device 0001:01:00.0 to group 19
>> pci 0001:01:00.1: enabling Extended Tags
>> iommu: Adding device 0001:01:00.1 to group 19
>> pci 0001:00:00.0: PCI bridge to [bus 01-ff]
>> fsl-pci ffe260000.pcie: PCI host bridge to bus 0002:00
>> pci_bus 0002:00: root bus resource [io
>> 0x8000080000032000-0x8000080000041fff] (bus address [0x0000-0xffff])
>> pci_bus 0002:00: root bus resource [mem 0xc20000000-0xc2fffffff] (bus
>> address [0xe0000000-0xefffffff])
>> pci_bus 0002:00: root bus resource [bus 00]
>> iommu: Removing device ffe260000.pcie from group 21
>> iommu: Adding device 0002:00:00.0 to group 20
>> pci 0002:00:00.0: bridge configuration invalid ([bus 00-00]),
>> reconfiguring
>> pci 0002:00:00.0: PCI bridge to [bus 01-ff]
>> fsl-pci ffe270000.pcie: PCI host bridge to bus 0003:00
>> pci_bus 0003:00: root bus resource [io
>> 0x8000080000043000-0x8000080000052fff] (bus address [0x0000-0xffff])
>> pci_bus 0003:00: root bus resource [mem 0xc30000000-0xc3fffffff] (bus
>> address [0xe0000000-0xefffffff])
>> pci_bus 0003:00: root bus resource [bus 00]
>> iommu: Removing device ffe270000.pcie from group 22
>> iommu: Adding device 0003:00:00.0 to group 21
>> pci 0003:00:00.0: bridge configuration invalid ([bus 00-00]),
>> reconfiguring
>> pci 0003:00:00.0: PCI bridge to [bus 01-ff]
>> PCI: Cannot allocate resource region 0 of device 0000:00:00.0, will remap
>> PCI: Cannot allocate resource region 0 of device 0001:00:00.0, will remap
>> PCI: Cannot allocate resource region 2 of device 0001:01:00.0, will remap
>> PCI: Cannot allocate resource region 5 of device 0001:01:00.0, will remap
>> PCI: Cannot allocate resource region 6 of device 0001:01:00.0, will remap
>> PCI: Cannot allocate resource region 0 of device 0001:01:00.1, will remap
>> PCI: Cannot allocate resource region 0 of device 0002:00:00.0, will remap
>> PCI: Cannot allocate resource region 0 of device 0003:00:00.0, will remap
>> pci 0000:00:00.0: BAR 0: no space for [mem size 0x01000000]
>> pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
>> pci 0000:00:00.0: PCI bridge to [bus 01]
>> pci 0000:00:00.0: bridge window [io
>> 0x8000080000010000-0x800008000001ffff]
>> pci 0000:00:00.0: bridge window [mem 0xc00000000-0xc0fffffff]
>> pci_bus 0000:00: Some PCI device resources are unassigned, try booting
>> with pci=realloc
>> pci 0001:00:00.0: BAR 0: no space for [mem size 0x01000000]
>> pci 0001:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
>> pci 0001:00:00.0: BAR 9: no space for [mem size 0x00200000 64bit pref]
>> pci 0001:00:00.0: BAR 9: failed to assign [mem size 0x00200000 64bit pref]
>> pci 0001:01:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
>> pci 0001:01:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
>> pci 0001:01:00.0: BAR 5: no space for [mem size 0x00040000]
>> pci 0001:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
>> pci 0001:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
>> pci 0001:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
>> pci 0001:01:00.1: BAR 0: no space for [mem size 0x00004000 64bit]
>> pci 0001:01:00.1: BAR 0: failed to assign [mem size 0x00004000 64bit]
>> pci 0001:00:00.0: PCI bridge to [bus 01]
>> pci 0001:00:00.0: bridge window [io
>> 0x8000080000021000-0x8000080000030fff]
>> pci 0001:00:00.0: bridge window [mem 0xc10000000-0xc1fffffff]
>> pci_bus 0001:00: Some PCI device resources are unassigned, try booting
>> with pci=realloc
>> pci 0002:00:00.0: BAR 0: no space for [mem size 0x01000000]
>> pci 0002:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
>> pci 0002:00:00.0: PCI bridge to [bus 01]
>> pci 0002:00:00.0: bridge window [io
>> 0x8000080000032000-0x8000080000041fff]
>> pci 0002:00:00.0: bridge window [mem 0xc20000000-0xc2fffffff]
>> pci_bus 0002:00: Some PCI device resources are unassigned, try booting
>> with pci=realloc
>> pci 0003:00:00.0: BAR 0: no space for [mem size 0x01000000]
>> pci 0003:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
>> pci 0003:00:00.0: PCI bridge to [bus 01]
>> pci 0003:00:00.0: bridge window [io
>> 0x8000080000043000-0x8000080000052fff]
>> pci 0003:00:00.0: bridge window [mem 0xc30000000-0xc3fffffff]
>> pci_bus 0003:00: Some PCI device resources are unassigned, try booting
>> with pci=realloc
>>
>>
>>
>>
> _______________________________________________
> amd-gfx mailing listamd-gfx at lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191202/6f29cf9e/attachment-0001.html>
More information about the amd-gfx
mailing list