powerplay change breaks driver

Tom St Denis tom.stdenis at amd.com
Mon Sep 25 18:26:24 UTC 2017


To narrow things down it's likely something in the CZ code paths as it 
still crashes with the Polaris10 removed.

Tom


On 25/09/17 01:55 PM, Tom St Denis wrote:
> This change
> 
> commit f96306921d5e346ebc82c7c51ae6e0b736e5b425
> Author: Rex Zhu <Rex.Zhu at amd.com>
> Date:   Wed Sep 20 14:44:55 2017 +0800
> 
>      drm/amd/powerplay: refine powerplay code.
> 
>      delete struct smumgr, put smu backend function table
>      in struct hwmgr
> 
>      Change-Id: I7b73ef062b147b4e7199105a3c101f6c8038cc57
>      Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>      Signed-off-by: Rex Zhu <Rex.Zhu at amd.com>
> 
> 
> Results in this dmesg log error messages on my Carrizo + Polaris10 setup:
> 
> [   24.237785] [drm] amdgpu kernel modesetting enabled.
> [   24.237814] checking generic (c0000000 7e9000) vs hw (e0000000 10000000)
> [   24.237864] amdgpu 0000:00:01.0: enabling device (0006 -> 0007)
> [   24.238366] [drm] initializing kernel modesetting (CARRIZO 
> 0x1002:0x9874 0x1002:0x1E10 0xE1).
> [   24.238394] [drm] register mmio base: 0xD1300000
> [   24.238394] [drm] register mmio size: 262144
> [   24.238463] ACPI Error: [\_SB_.ALIB] Namespace lookup failure, 
> AE_NOT_FOUND (20170531/psargs-364)
> [   24.238497] ACPI Error: Method parse/execution failed 
> \_SB.PCI0.VGA.ATC0, AE_NOT_FOUND (20170531/psparse-550)
> [   24.238528] ACPI Error: Method parse/execution failed 
> \_SB.PCI0.VGA.ATCS, AE_NOT_FOUND (20170531/psparse-550)
> [   24.238558] [drm] UVD is enabled in physical mode
> [   24.238561] [drm] VCE enabled in physical mode
> [   24.250365] ATOM BIOS: 109-C95010-001
> [   24.250381] [drm] GPU post is not needed
> [   24.250407] [drm] vm size is 64 GB, block size is 13-bit, fragment 
> size is 9-bit
> [   24.250412] amdgpu 0000:00:01.0: VRAM: 512M 0x000000F400000000 - 
> 0x000000F41FFFFFFF (512M used)
> [   24.250413] amdgpu 0000:00:01.0: GTT: 1024M 0x0000000000000000 - 
> 0x000000003FFFFFFF
> [   24.250420] [drm] Detected VRAM RAM=512M, BAR=512M
> [   24.250421] [drm] RAM width 64bits UNKNOWN
> [   24.250795] [TTM] Zone  kernel: Available graphics memory: 3846244 kiB
> [   24.250797] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [   24.250797] [TTM] Initializing pool allocator
> [   24.250801] [TTM] Initializing DMA pool allocator
> [   24.250844] [drm] amdgpu: 512M of VRAM memory ready
> [   24.250845] [drm] amdgpu: 3072M of GTT memory ready.
> [   24.250860] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [   24.250970] [drm] PCIE GART of 1024M enabled (table at 
> 0x000000F400040000).
> [   24.251017] amdgpu 0000:00:01.0: amdgpu: using MSI.
> [   24.251034] [drm] amdgpu: irq initialized.
> [   24.251037] amdgpu: [powerplay] amdgpu: powerplay sw initialized
> [   24.254140] [drm] Chained IB support enabled!
> [   24.257056] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 
> 0x0000000000400080, cpu addr 0xffffc9000105d080
> [   24.257196] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 
> 0x0000000000400100, cpu addr 0xffffc9000105d100
> [   24.257922] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 
> 0x0000000000400180, cpu addr 0xffffc9000105d180
> [   24.258053] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 
> 0x0000000000400200, cpu addr 0xffffc9000105d200
> [   24.258115] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 
> 0x0000000000400280, cpu addr 0xffffc9000105d280
> [   24.258146] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 
> 0x0000000000400300, cpu addr 0xffffc9000105d300
> [   24.258353] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 
> 0x0000000000400380, cpu addr 0xffffc9000105d380
> [   24.258426] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 
> 0x0000000000400400, cpu addr 0xffffc9000105d400
> [   24.258484] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 
> 0x0000000000400480, cpu addr 0xffffc9000105d480
> [   24.258528] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 
> 0x0000000000400520, cpu addr 0xffffc9000105d520
> [   24.260159] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 
> 0x00000000004005a0, cpu addr 0xffffc9000105d5a0
> [   24.260508] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 
> 0x0000000000400620, cpu addr 0xffffc9000105d620
> [   24.261591] [drm] Found UVD firmware Version: 1.91 Family ID: 11
> [   24.262451] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 
> 0x000000f400296560, cpu addr 0xffffc90003442560
> [   24.263350] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
> [   24.263819] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 
> 0x0000000000400720, cpu addr 0xffffc9000105d720
> [   24.263921] amdgpu 0000:00:01.0: fence driver on ring 14 use gpu addr 
> 0x00000000004007a0, cpu addr 0xffffc9000105d7a0
> [   24.264438] amdgpu: [powerplay] Fail to get clock table from SMU!
> [   24.264440] amdgpu: [powerplay] amdgpu: powerplay initialization failed
> [   24.264467] [drm] DAL is enabled
> [   24.264835] [drm] DC: create_links: connectors_num: physical:3, 
> virtual:0
> [   24.264839] [drm] Connector[0] description:signal 32
> [   24.264842] [drm] Using channel: CHANNEL_ID_DDC1 [1]
> [   24.264851] [drm] Connector[1] description:signal 4
> [   24.264853] [drm] Using channel: CHANNEL_ID_DDC2 [2]
> [   24.264860] [drm] Connector[2] description:signal 4
> [   24.264862] [drm] Using channel: CHANNEL_ID_DDC3 [3]
> [   24.564284] [drm:hwss_wait_for_blank_complete [amdgpu]] *ERROR* DC: 
> failed to blank crtc!
> [   24.564329] [drm] Display Core initialized
> [   24.564332] [drm] amdgpu: freesync_module init done ffff88021048afe0.
> [   24.564564] [drm] link=0, dc_sink_in=          (null) is now 
> Disconnected
> [   24.564565] [drm] DCHPD: connector_id=0: dc_sink didn't change.
> [   24.564624] [drm] link=1, dc_sink_in=          (null) is now 
> Disconnected
> [   24.564624] [drm] DCHPD: connector_id=1: dc_sink didn't change.
> [   24.564738] [drm] link=2, dc_sink_in=          (null) is now 
> Disconnected
> [   24.564739] [drm] DCHPD: connector_id=2: dc_sink didn't change.
> [   24.564751] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [   24.564752] [drm] Driver supports precise vblank timestamp query.
> [   24.564752] [drm] KMS initialized.
> [   24.566110] [drm] ring test on 0 succeeded in 13 usecs
> [   24.755765] [drm:gfx_v8_0_kiq_resume [amdgpu]] *ERROR* KCQ enable 
> failed (scratch(0xC040)=0xCAFEDEAD)
> [   24.755819] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP 
> block <gfx_v8_0> failed -22
> [   24.755839] amdgpu 0000:00:01.0: amdgpu_init failed
> [   24.756271] BUG: unable to handle kernel NULL pointer dereference at 
>           (null)
> [   24.756302] IP:           (null)
> [   24.756312] PGD 2134b3067
> [   24.756312] P4D 2134b3067
> [   24.756320] PUD 0
> 
> [   24.756340] Oops: 0010 [#1] SMP
> [   24.756349] Modules linked in: amdgpu(+) chash ttm ax88179_178a 
> usbnet xhci_pci xhci_hcd efivarfs
> [   24.756380] CPU: 3 PID: 3021 Comm: modprobe Not tainted 4.13.0-rc5+ #33
> [   24.756396] Hardware name: AMD Myrtle/Myrtle, BIOS TMY1100A 03/23/2016
> [   24.756413] task: ffff8802132744c0 task.stack: ffffc90000fd0000
> [   24.756427] RIP: 0010:          (null)
> [   24.756437] RSP: 0018:ffffc90000fd3908 EFLAGS: 00010202
> [   24.756450] RAX: ffff88021048a460 RBX: ffff8802100258a0 RCX: 
> 000000018020000d
> [   24.756466] RDX: 000000018020000e RSI: 0000000000005c02 RDI: 
> ffff88021048a5a0
> [   24.756482] RBP: ffffc90000fd3928 R08: ffff880210f9e580 R09: 
> 000000018020000d
> [   24.756499] R10: ffffc90000fd3948 R11: ffffea0008525e00 R12: 
> 0000000000005c02
> [   24.756516] R13: ffff88021365b690 R14: ffff880211db0040 R15: 
> ffff880211db2f30
> [   24.756534] FS:  00007ffa8be38700(0000) GS:ffff88021ed80000(0000) 
> knlGS:0000000000000000
> [   24.756554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   24.756569] CR2: 0000000000000000 CR3: 0000000210030000 CR4: 
> 00000000001406e0
> [   24.756586] Call Trace:
> [   24.756745]  ? destroy+0x31/0x100 [amdgpu]
> [   24.756822]  dal_i2caux_destruct+0x5d/0x90 [amdgpu]
> [   24.756875]  destroy+0x15/0x30 [amdgpu]
> [   24.756925]  dal_i2caux_destroy+0x1b/0x30 [amdgpu]
> [   24.756977]  destruct+0x90/0x140 [amdgpu]
> [   24.757028]  dc_destroy+0x10/0x30 [amdgpu]
> [   24.757083]  amdgpu_dm_fini+0x62/0x70 [amdgpu]
> [   24.757137]  dm_hw_fini+0x1d/0x30 [amdgpu]
> [   24.757183]  amdgpu_fini+0xe8/0x330 [amdgpu]
> [   24.757229]  amdgpu_device_init+0xe5a/0x1560 [amdgpu]
> [   24.757245]  ? kmalloc_order_trace+0x29/0xd0
> [   24.757290]  ? amdgpu_driver_load_kms+0x53/0x200 [amdgpu]
> [   24.757338]  amdgpu_driver_load_kms+0x78/0x200 [amdgpu]
> [   24.757353]  drm_dev_register+0x141/0x1d0
> [   24.757393]  amdgpu_pci_probe+0x113/0x140 [amdgpu]
> [   24.757406]  local_pci_probe+0x40/0xa0
> [   24.757416]  pci_device_probe+0xaa/0x130
> [   24.757426]  driver_probe_device+0x23e/0x2d0
> [   24.757437]  __driver_attach+0x96/0xa0
> [   24.757446]  ? driver_probe_device+0x2d0/0x2d0
> [   24.757457]  bus_for_each_dev+0x5b/0x90
> [   24.757467]  driver_attach+0x19/0x20
> [   24.757476]  bus_add_driver+0x11c/0x220
> [   24.757485]  driver_register+0x5b/0xd0
> [   24.757495]  __pci_register_driver+0x47/0x50
> [   24.757532]  amdgpu_init+0x88/0x9b [amdgpu]
> [   24.757544]  ? 0xffffffffa030a000
> [   24.757554]  do_one_initcall+0x3e/0x160
> [   24.757566]  ? __vunmap+0x7c/0xb0
> [   24.757577]  ? kfree+0x147/0x160
> [   24.757587]  ? kmem_cache_alloc_trace+0x33/0x150
> [   24.757602]  do_init_module+0x5a/0x1f1
> [   24.757614]  load_module+0x2329/0x28d0
> [   24.758259]  ? kernel_read_file+0x19e/0x1c0
> [   24.758898]  SYSC_finit_module+0xba/0xc0
> [   24.759524]  ? SYSC_finit_module+0xba/0xc0
> [   24.760206]  SyS_finit_module+0x9/0x10
> [   24.760835]  entry_SYSCALL_64_fastpath+0x13/0x94
> [   24.761450] RIP: 0033:0x7ffa8b310219
> [   24.762137] RSP: 002b:00007ffe64b86b18 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000139
> [   24.762851] RAX: ffffffffffffffda RBX: 00000055ee325090 RCX: 
> 00007ffa8b310219
> [   24.763487] RDX: 0000000000000000 RSI: 00000055edf2d2a6 RDI: 
> 0000000000000005
> [   24.764116] RBP: 00000055ee326f50 R08: 0000000000000000 R09: 
> 0000000000000000
> [   24.764716] R10: 0000000000000005 R11: 0000000000000246 R12: 
> 00000055ee3252f0
> [   24.765298] R13: 00007ffe64b86ad8 R14: 00007ffe64b86ae0 R15: 
> 0000000000000000
> [   24.765878] Code:  Bad RIP value.
> [   24.766464] RIP:           (null) RSP: ffffc90000fd3908
> [   24.767036] CR2: 0000000000000000
> [   24.767717] ---[ end trace 636f871b29b747e7 ]---
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list