[Bug 108754] hard crash of amdgpu in 4.20-rc

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Nov 15 12:32:48 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=108754

            Bug ID: 108754
           Summary: hard crash of amdgpu in 4.20-rc
           Product: DRI
           Version: unspecified
          Hardware: PowerPC
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: dan at danny.cz
                CC: bcrocker at redhat.com

Created attachment 142474
  --> https://bugs.freedesktop.org/attachment.cgi?id=142474&action=edit
full dmesg output

I'm seeing hard crashes (taking down the whole system) in the amdgpu driver in
4.20-rc kernels (starting around rc1). This is on Power9 Talos system with
Radeon Pro WX4100.

after "modprobe amdgpu" in a system booted with "modprobe.blacklist=amdgpu" I
got following and the system stopped responding
...
lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu kernel modesetting enabled.
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: enabling device
(0540 -> 0542)
lis 15 12:40:56 talos.danny.cz kernel: [drm] initializing kernel modesetting
(POLARIS11 0x1002:0x67E3 0x1002:0x0B0D 0x00).
lis 15 12:40:56 talos.danny.cz kernel: [drm] register mmio base: 0x00000000
lis 15 12:40:56 talos.danny.cz kernel: [drm] register mmio size: 262144
lis 15 12:40:56 talos.danny.cz kernel: [drm] PCI I/O BAR is not found.
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 0 <vi_common>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 1 <gmc_v8_0>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 2 <tonga_ih>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 3 <gfx_v8_0>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 4 <sdma_v3_0>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 5 <powerplay>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 6 <dm>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 7 <uvd_v6_0>
lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 8 <vce_v3_0>
lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD is enabled in VM mode
lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD ENC is enabled in VM mode
lis 15 12:40:56 talos.danny.cz kernel: [drm] VCE enabled in VM mode
lis 15 12:40:56 talos.danny.cz kernel: ATOM BIOS: 113-D0150600-103
lis 15 12:40:56 talos.danny.cz kernel: [drm] vm size is 256 GB, 2 levels, block
size is 10-bit, fragment size is 9-bit
lis 15 12:40:56 talos.danny.cz kernel: amdgpu: No suitable DMA available
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2: releasing
[mem 0x6000010000000-0x60000101fffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0: releasing
[mem 0x6000000000000-0x600000fffffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: BAR 15: releasing [mem
0x6000000000000-0x6003fbff0ffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: BAR 15: assigned [mem
0x6000000000000-0x600017fffffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0: assigned
[mem 0x6000000000000-0x60000ffffffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2: assigned
[mem 0x6000100000000-0x60001001fffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: PCI bridge to [bus 01]
lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0:   bridge window [mem
0x600c000000000-0x600c07fefffff]
lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0:   bridge window [mem
0x6000000000000-0x6003fbff0ffff 64bit pref]
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M
0x000000FF00000000 - 0x000000FF0FFFFFFF
lis 15 12:40:56 talos.danny.cz kernel: [drm] Detected VRAM RAM=4096M, BAR=4096M
lis 15 12:40:56 talos.danny.cz kernel: [drm] RAM width 128bits GDDR5
lis 15 12:40:56 talos.danny.cz kernel: [TTM] Zone  kernel: Available graphics
memory: 33386016 kiB
lis 15 12:40:56 talos.danny.cz kernel: [TTM] Zone   dma32: Available graphics
memory: 2097152 kiB
lis 15 12:40:56 talos.danny.cz kernel: [TTM] Initializing pool allocator
lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu: 4096M of VRAM memory ready
lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu: 4096M of GTT memory ready.
lis 15 12:40:56 talos.danny.cz kernel: [drm] GART: num cpu pages 4096, num gpu
pages 65536
lis 15 12:40:56 talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table
at 0x000000F4008D0000).
lis 15 12:40:56 talos.danny.cz kernel: [drm] Chained IB support enabled!
lis 15 12:40:56 talos.danny.cz kernel: [drm] Found UVD firmware Version: 1.130
Family ID: 16
lis 15 12:40:56 talos.danny.cz kernel: [drm] Found VCE firmware Version: 53.26
Binary ID: 3
lis 15 12:40:56 talos.danny.cz kernel: amdgpu: [powerplay] dpm has been enabled
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: values for Engine clock
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         214000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         517000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         845000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1049000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1099000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1136000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1175000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1201000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: Validation clocks:
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    engine_max_clock: 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    memory_max_clock: 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    level           : 8
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: reducing engine clock
level from 8 to 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: values for Memory clock
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         300000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:         1500000
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: Validation clocks:
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    engine_max_clock: 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    memory_max_clock: 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB:    level           : 8
lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: reducing memory clock
level from 2 to 0
lis 15 12:40:56 talos.danny.cz kernel: [drm] Display Core initialized with
v3.1.68!
lis 15 12:40:56 talos.danny.cz kernel: [drm] Supports vblank timestamp caching
Rev 2 (21.10.2013).
lis 15 12:40:56 talos.danny.cz kernel: [drm] Driver supports precise vblank
timestamp query.
lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD and UVD ENC initialized
successfully.
lis 15 12:40:58 talos.danny.cz kernel: [drm] VCE initialized successfully.
lis 15 12:40:58 talos.danny.cz kernel: [drm] Cannot find any crtc or sizes
lis 15 12:40:58 talos.danny.cz kernel: Unable to handle kernel paging request
for data at address 0xc000001369cefffc
lis 15 12:40:58 talos.danny.cz kernel: Faulting instruction address:
0xc008000011b8be54
lis 15 12:40:58 talos.danny.cz kernel: Oops: Kernel access of bad area, sig: 11
[#1]
lis 15 12:40:58 talos.danny.cz kernel: LE SMP NR_CPUS=1024 NUMA PowerNV
lis 15 12:40:58 talos.danny.cz kernel: Modules linked in: amdgpu(+) mfd_core
chash gpu_sched i2c_algo_bit ttm drm_kms_helper drm
drm_panel_orientation_quirks fb_sys_fops syscopyarea sysfillrect sysimgblt
xt_CHECKSUM ipt_MASQUERADE tun kvm_hv kvm devlink ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge
stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables sunrpc dm_crypt snd_hda_codec_realtek
snd_hda_codec_generic at24 snd_hda_codec_hdmi snd_hda_intel regmap_i2c
snd_hda_codec ipmi_powernv ipmi_devintf i2c_opal snd_hda_core i2c_core
snd_hwdep snd_seq vmx_crypto snd_seq_device snd_pcm ses enclosure
ipmi_msghandler snd_timer scsi_transport_sas snd ofpart powernv_flash mtd
rtc_opal opal_prd crct10dif_vpmsum soundcore raid1 aacraid tg3 crc32c_vpmsum
lis 15 12:40:58 talos.danny.cz kernel: CPU: 0 PID: 338 Comm: kworker/0:2 Not
tainted 4.20.0-rc2+ #1
lis 15 12:40:58 talos.danny.cz kernel: Workqueue: events work_for_cpu_fn
lis 15 12:40:58 talos.danny.cz kernel: NIP:  c008000011b8be54 LR:
c008000011b7885c CTR: c008000011b8bd68
lis 15 12:40:58 talos.danny.cz kernel: REGS: c0000007f84533c0 TRAP: 0300   Not
tainted  (4.20.0-rc2+)
lis 15 12:40:58 talos.danny.cz kernel: MSR:  9000000002009033
<SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 84002482  XER: 20040000
lis 15 12:40:58 talos.danny.cz kernel: CFAR: c008000011b8c6fc DAR:
c000001369cefffc DSISR: 42000000 IRQMASK: 0 
                                       GPR00: c008000011b7885c c0000007f8453648
c008000011d69e00 c0000007f74bf67c 
                                       GPR04: 000000000001d524 00000000000249f0
c0000007f8453758 0000000020130307 
                                       GPR08: c000001369cefff4 c000000769cf0000
0000000000000001 0000000002100800 
                                       GPR12: c008000011b8bd68 c0000000018b0000
c000000000151e88 c0000007fe1f8340 
                                       GPR16: 0000000000000000 0000000000000000
0000000000000000 c0000007f87d30c0 
                                       GPR20: c0000007f87d30c8 c0000007f87d30b8
c0000007f87d30d8 c0000007f87d30e0 
                                       GPR24: c0000007f87d30d0 c0000007f87dc528
0000000000000000 0000000000000001 
                                       GPR28: c000000769cf0000 c0000007f8453710
c0000007f74b2340 c000200721935c00 
lis 15 12:40:58 talos.danny.cz kernel: NIP [c008000011b8be54]
smu7_set_power_state_tasks+0xec/0xab0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: LR [c008000011b7885c]
phm_set_power_state+0x64/0xc0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: Call Trace:
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453648] [c008000011b4ee7c]
amdgpu_cgs_write_ind_register+0x84/0x170 [amdgpu] (unreliable)
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84536e8] [c008000011b7885c]
phm_set_power_state+0x64/0xc0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453728] [c008000011ba0d48]
psm_adjust_power_state_dynamic+0x130/0x270 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453788] [c008000011b764f0]
hwmgr_handle_task+0x58/0x178 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84537c8] [c008000011bae29c]
pp_late_init+0xa4/0x1f0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453868] [c008000011a318d8]
amdgpu_device_ip_late_init+0x90/0x1b0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84538f8] [c008000011a34cb8]
amdgpu_device_init+0x1590/0x18e0 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453a08] [c008000011a3823c]
amdgpu_driver_load_kms+0xb4/0x330 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453a88] [c008000010ccae30]
drm_dev_register+0x1b8/0x280 [drm]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453b28] [c008000011a306bc]
amdgpu_pci_probe+0x114/0x200 [amdgpu]
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453bb8] [c00000000070024c]
local_pci_probe+0x6c/0x140
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453c48] [c000000000143b88]
work_for_cpu_fn+0x38/0x60
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453c78] [c000000000148c40]
process_one_work+0x250/0x500
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453d18] [c000000000149160]
worker_thread+0x270/0x5b0
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453db8] [c00000000015202c]
kthread+0x1ac/0x1c0
lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453e28] [c00000000000bdd0]
ret_from_kernel_thread+0x5c/0x6c
lis 15 12:40:58 talos.danny.cz kernel: Instruction dump:
lis 15 12:40:58 talos.danny.cz kernel: 7d485378 7f872000 419e0464 39480001
38c6000c 794a0020 4200ffe4 1d08000c 
lis 15 12:40:58 talos.danny.cz kernel: 81490d3c 614a0001 7d094214 91490d3c
<90880008> 81490064 2faa0000 419e0880 
lis 15 12:40:58 talos.danny.cz kernel: ---[ end trace d5e132cd328da1c7 ]---
lis 15 12:40:58 talos.danny.cz kernel:

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181115/43912daa/attachment-0001.html>


More information about the dri-devel mailing list