powerplay change breaks driver

Zhu, Rex Rex.Zhu at amd.com
Tue Sep 26 02:53:21 UTC 2017


Thanks Tom.

Have found the root cause.
An copy error when initialize smu function table.

case AMDGPU_FAMILY_CZ:
-               hwmgr->smumgr_funcs = &ci_smu_funcs;
+               hwmgr->smumgr_funcs = &cz_smu_funcs;


Best Regards
Rex
-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of Tom St Denis
Sent: Tuesday, September 26, 2017 2:26 AM
To: amd-gfx at lists.freedesktop.org
Subject: Re: powerplay change breaks driver

To narrow things down it's likely something in the CZ code paths as it still crashes with the Polaris10 removed.

Tom


On 25/09/17 01:55 PM, Tom St Denis wrote:
> This change
> 
> commit f96306921d5e346ebc82c7c51ae6e0b736e5b425
> Author: Rex Zhu <Rex.Zhu at amd.com>
> Date:   Wed Sep 20 14:44:55 2017 +0800
> 
>      drm/amd/powerplay: refine powerplay code.
> 
>      delete struct smumgr, put smu backend function table
>      in struct hwmgr
> 
>      Change-Id: I7b73ef062b147b4e7199105a3c101f6c8038cc57
>      Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>      Signed-off-by: Rex Zhu <Rex.Zhu at amd.com>
> 
> 
> Results in this dmesg log error messages on my Carrizo + Polaris10 setup:
> 
> [   24.237785] [drm] amdgpu kernel modesetting enabled.
> [   24.237814] checking generic (c0000000 7e9000) vs hw (e0000000 
> 10000000) [   24.237864] amdgpu 0000:00:01.0: enabling device (0006 -> 
> 0007) [   24.238366] [drm] initializing kernel modesetting (CARRIZO
> 0x1002:0x9874 0x1002:0x1E10 0xE1).
> [   24.238394] [drm] register mmio base: 0xD1300000 [   24.238394] 
> [drm] register mmio size: 262144 [   24.238463] ACPI Error: 
> [\_SB_.ALIB] Namespace lookup failure, AE_NOT_FOUND 
> (20170531/psargs-364) [   24.238497] ACPI Error: Method 
> parse/execution failed \_SB.PCI0.VGA.ATC0, AE_NOT_FOUND 
> (20170531/psparse-550) [   24.238528] ACPI Error: Method 
> parse/execution failed \_SB.PCI0.VGA.ATCS, AE_NOT_FOUND 
> (20170531/psparse-550) [   24.238558] [drm] UVD is enabled in physical 
> mode [   24.238561] [drm] VCE enabled in physical mode [   24.250365] 
> ATOM BIOS: 109-C95010-001 [   24.250381] [drm] GPU post is not needed 
> [   24.250407] [drm] vm size is 64 GB, block size is 13-bit, fragment 
> size is 9-bit [   24.250412] amdgpu 0000:00:01.0: VRAM: 512M 
> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used) [   24.250413] 
> amdgpu 0000:00:01.0: GTT: 1024M 0x0000000000000000 - 
> 0x000000003FFFFFFF [   24.250420] [drm] Detected VRAM RAM=512M, 
> BAR=512M [   24.250421] [drm] RAM width 64bits UNKNOWN [   24.250795] 
> [TTM] Zone  kernel: Available graphics memory: 3846244 kiB [   
> 24.250797] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB 
> [   24.250797] [TTM] Initializing pool allocator [   24.250801] [TTM] 
> Initializing DMA pool allocator [   24.250844] [drm] amdgpu: 512M of 
> VRAM memory ready [   24.250845] [drm] amdgpu: 3072M of GTT memory 
> ready.
> [   24.250860] [drm] GART: num cpu pages 262144, num gpu pages 262144 
> [   24.250970] [drm] PCIE GART of 1024M enabled (table at 
> 0x000000F400040000).
> [   24.251017] amdgpu 0000:00:01.0: amdgpu: using MSI.
> [   24.251034] [drm] amdgpu: irq initialized.
> [   24.251037] amdgpu: [powerplay] amdgpu: powerplay sw initialized [   
> 24.254140] [drm] Chained IB support enabled!
> [   24.257056] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu 
> addr 0x0000000000400080, cpu addr 0xffffc9000105d080 [   24.257196] 
> amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 
> 0x0000000000400100, cpu addr 0xffffc9000105d100 [   24.257922] amdgpu 
> 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000000400180, 
> cpu addr 0xffffc9000105d180 [   24.258053] amdgpu 0000:00:01.0: fence 
> driver on ring 3 use gpu addr 0x0000000000400200, cpu addr 
> 0xffffc9000105d200 [   24.258115] amdgpu 0000:00:01.0: fence driver on 
> ring 4 use gpu addr 0x0000000000400280, cpu addr 0xffffc9000105d280 [   
> 24.258146] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 
> 0x0000000000400300, cpu addr 0xffffc9000105d300 [   24.258353] amdgpu 
> 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000000400380, 
> cpu addr 0xffffc9000105d380 [   24.258426] amdgpu 0000:00:01.0: fence 
> driver on ring 7 use gpu addr 0x0000000000400400, cpu addr 
> 0xffffc9000105d400 [   24.258484] amdgpu 0000:00:01.0: fence driver on 
> ring 8 use gpu addr 0x0000000000400480, cpu addr 0xffffc9000105d480 [   
> 24.258528] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 
> 0x0000000000400520, cpu addr 0xffffc9000105d520 [   24.260159] amdgpu 
> 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000004005a0, 
> cpu addr 0xffffc9000105d5a0 [   24.260508] amdgpu 0000:00:01.0: fence 
> driver on ring 11 use gpu addr 0x0000000000400620, cpu addr 
> 0xffffc9000105d620 [   24.261591] [drm] Found UVD firmware Version: 
> 1.91 Family ID: 11 [   24.262451] amdgpu 0000:00:01.0: fence driver on 
> ring 12 use gpu addr 0x000000f400296560, cpu addr 0xffffc90003442560 [   
> 24.263350] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [   
> 24.263819] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 
> 0x0000000000400720, cpu addr 0xffffc9000105d720 [   24.263921] amdgpu 
> 0000:00:01.0: fence driver on ring 14 use gpu addr 0x00000000004007a0, 
> cpu addr 0xffffc9000105d7a0 [   24.264438] amdgpu: [powerplay] Fail to 
> get clock table from SMU!
> [   24.264440] amdgpu: [powerplay] amdgpu: powerplay initialization 
> failed [   24.264467] [drm] DAL is enabled [   24.264835] [drm] DC: 
> create_links: connectors_num: physical:3,
> virtual:0
> [   24.264839] [drm] Connector[0] description:signal 32 [   24.264842] 
> [drm] Using channel: CHANNEL_ID_DDC1 [1] [   24.264851] [drm] 
> Connector[1] description:signal 4 [   24.264853] [drm] Using channel: 
> CHANNEL_ID_DDC2 [2] [   24.264860] [drm] Connector[2] 
> description:signal 4 [   24.264862] [drm] Using channel: 
> CHANNEL_ID_DDC3 [3] [   24.564284] [drm:hwss_wait_for_blank_complete 
> [amdgpu]] *ERROR* DC:
> failed to blank crtc!
> [   24.564329] [drm] Display Core initialized [   24.564332] [drm] 
> amdgpu: freesync_module init done ffff88021048afe0.
> [   24.564564] [drm] link=0, dc_sink_in=          (null) is now 
> Disconnected [   24.564565] [drm] DCHPD: connector_id=0: dc_sink 
> didn't change.
> [   24.564624] [drm] link=1, dc_sink_in=          (null) is now 
> Disconnected [   24.564624] [drm] DCHPD: connector_id=1: dc_sink 
> didn't change.
> [   24.564738] [drm] link=2, dc_sink_in=          (null) is now 
> Disconnected [   24.564739] [drm] DCHPD: connector_id=2: dc_sink 
> didn't change.
> [   24.564751] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [   24.564752] [drm] Driver supports precise vblank timestamp query.
> [   24.564752] [drm] KMS initialized.
> [   24.566110] [drm] ring test on 0 succeeded in 13 usecs [   
> 24.755765] [drm:gfx_v8_0_kiq_resume [amdgpu]] *ERROR* KCQ enable 
> failed (scratch(0xC040)=0xCAFEDEAD) [   24.755819] 
> [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block 
> <gfx_v8_0> failed -22 [   24.755839] amdgpu 0000:00:01.0: amdgpu_init 
> failed [   24.756271] BUG: unable to handle kernel NULL pointer 
> dereference at
>           (null)
> [   24.756302] IP:           (null)
> [   24.756312] PGD 2134b3067
> [   24.756312] P4D 2134b3067
> [   24.756320] PUD 0
> 
> [   24.756340] Oops: 0010 [#1] SMP
> [   24.756349] Modules linked in: amdgpu(+) chash ttm ax88179_178a 
> usbnet xhci_pci xhci_hcd efivarfs [   24.756380] CPU: 3 PID: 3021 
> Comm: modprobe Not tainted 4.13.0-rc5+ #33 [   24.756396] Hardware 
> name: AMD Myrtle/Myrtle, BIOS TMY1100A 03/23/2016 [   24.756413] task: 
> ffff8802132744c0 task.stack: ffffc90000fd0000 [   24.756427] RIP: 
> 0010:          (null) [   24.756437] RSP: 0018:ffffc90000fd3908 
> EFLAGS: 00010202 [   24.756450] RAX: ffff88021048a460 RBX: 
> ffff8802100258a0 RCX:
> 000000018020000d
> [   24.756466] RDX: 000000018020000e RSI: 0000000000005c02 RDI: 
> ffff88021048a5a0
> [   24.756482] RBP: ffffc90000fd3928 R08: ffff880210f9e580 R09: 
> 000000018020000d
> [   24.756499] R10: ffffc90000fd3948 R11: ffffea0008525e00 R12: 
> 0000000000005c02
> [   24.756516] R13: ffff88021365b690 R14: ffff880211db0040 R15: 
> ffff880211db2f30
> [   24.756534] FS:  00007ffa8be38700(0000) GS:ffff88021ed80000(0000)
> knlGS:0000000000000000
> [   24.756554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [   
> 24.756569] CR2: 0000000000000000 CR3: 0000000210030000 CR4:
> 00000000001406e0
> [   24.756586] Call Trace:
> [   24.756745]  ? destroy+0x31/0x100 [amdgpu] [   24.756822]  
> dal_i2caux_destruct+0x5d/0x90 [amdgpu] [   24.756875]  
> destroy+0x15/0x30 [amdgpu] [   24.756925]  
> dal_i2caux_destroy+0x1b/0x30 [amdgpu] [   24.756977]  
> destruct+0x90/0x140 [amdgpu] [   24.757028]  dc_destroy+0x10/0x30 
> [amdgpu] [   24.757083]  amdgpu_dm_fini+0x62/0x70 [amdgpu] [   
> 24.757137]  dm_hw_fini+0x1d/0x30 [amdgpu] [   24.757183]  
> amdgpu_fini+0xe8/0x330 [amdgpu] [   24.757229]  
> amdgpu_device_init+0xe5a/0x1560 [amdgpu] [   24.757245]  ? 
> kmalloc_order_trace+0x29/0xd0 [   24.757290]  ? 
> amdgpu_driver_load_kms+0x53/0x200 [amdgpu] [   24.757338]  
> amdgpu_driver_load_kms+0x78/0x200 [amdgpu] [   24.757353]  
> drm_dev_register+0x141/0x1d0 [   24.757393]  
> amdgpu_pci_probe+0x113/0x140 [amdgpu] [   24.757406]  
> local_pci_probe+0x40/0xa0 [   24.757416]  pci_device_probe+0xaa/0x130 
> [   24.757426]  driver_probe_device+0x23e/0x2d0 [   24.757437]  
> __driver_attach+0x96/0xa0 [   24.757446]  ? 
> driver_probe_device+0x2d0/0x2d0 [   24.757457]  
> bus_for_each_dev+0x5b/0x90 [   24.757467]  driver_attach+0x19/0x20 [   
> 24.757476]  bus_add_driver+0x11c/0x220 [   24.757485]  
> driver_register+0x5b/0xd0 [   24.757495]  
> __pci_register_driver+0x47/0x50 [   24.757532]  amdgpu_init+0x88/0x9b 
> [amdgpu] [   24.757544]  ? 0xffffffffa030a000 [   24.757554]  
> do_one_initcall+0x3e/0x160 [   24.757566]  ? __vunmap+0x7c/0xb0 [   
> 24.757577]  ? kfree+0x147/0x160 [   24.757587]  ? 
> kmem_cache_alloc_trace+0x33/0x150 [   24.757602]  
> do_init_module+0x5a/0x1f1 [   24.757614]  load_module+0x2329/0x28d0 [   
> 24.758259]  ? kernel_read_file+0x19e/0x1c0 [   24.758898]  
> SYSC_finit_module+0xba/0xc0 [   24.759524]  ? 
> SYSC_finit_module+0xba/0xc0 [   24.760206]  SyS_finit_module+0x9/0x10 
> [   24.760835]  entry_SYSCALL_64_fastpath+0x13/0x94
> [   24.761450] RIP: 0033:0x7ffa8b310219 [   24.762137] RSP: 
> 002b:00007ffe64b86b18 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [   24.762851] RAX: ffffffffffffffda RBX: 00000055ee325090 RCX: 
> 00007ffa8b310219
> [   24.763487] RDX: 0000000000000000 RSI: 00000055edf2d2a6 RDI: 
> 0000000000000005
> [   24.764116] RBP: 00000055ee326f50 R08: 0000000000000000 R09: 
> 0000000000000000
> [   24.764716] R10: 0000000000000005 R11: 0000000000000246 R12: 
> 00000055ee3252f0
> [   24.765298] R13: 00007ffe64b86ad8 R14: 00007ffe64b86ae0 R15: 
> 0000000000000000
> [   24.765878] Code:  Bad RIP value.
> [   24.766464] RIP:           (null) RSP: ffffc90000fd3908 [   
> 24.767036] CR2: 0000000000000000 [   24.767717] ---[ end trace 
> 636f871b29b747e7 ]--- _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list