powerplay change breaks driver
Zhu, Rex
Rex.Zhu at amd.com
Tue Sep 26 02:53:21 UTC 2017
Thanks Tom.
Have found the root cause.
An copy error when initialize smu function table.
case AMDGPU_FAMILY_CZ:
- hwmgr->smumgr_funcs = &ci_smu_funcs;
+ hwmgr->smumgr_funcs = &cz_smu_funcs;
Best Regards
Rex
-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of Tom St Denis
Sent: Tuesday, September 26, 2017 2:26 AM
To: amd-gfx at lists.freedesktop.org
Subject: Re: powerplay change breaks driver
To narrow things down it's likely something in the CZ code paths as it still crashes with the Polaris10 removed.
Tom
On 25/09/17 01:55 PM, Tom St Denis wrote:
> This change
>
> commit f96306921d5e346ebc82c7c51ae6e0b736e5b425
> Author: Rex Zhu <Rex.Zhu at amd.com>
> Date: Wed Sep 20 14:44:55 2017 +0800
>
> drm/amd/powerplay: refine powerplay code.
>
> delete struct smumgr, put smu backend function table
> in struct hwmgr
>
> Change-Id: I7b73ef062b147b4e7199105a3c101f6c8038cc57
> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
> Signed-off-by: Rex Zhu <Rex.Zhu at amd.com>
>
>
> Results in this dmesg log error messages on my Carrizo + Polaris10 setup:
>
> [ 24.237785] [drm] amdgpu kernel modesetting enabled.
> [ 24.237814] checking generic (c0000000 7e9000) vs hw (e0000000
> 10000000) [ 24.237864] amdgpu 0000:00:01.0: enabling device (0006 ->
> 0007) [ 24.238366] [drm] initializing kernel modesetting (CARRIZO
> 0x1002:0x9874 0x1002:0x1E10 0xE1).
> [ 24.238394] [drm] register mmio base: 0xD1300000 [ 24.238394]
> [drm] register mmio size: 262144 [ 24.238463] ACPI Error:
> [\_SB_.ALIB] Namespace lookup failure, AE_NOT_FOUND
> (20170531/psargs-364) [ 24.238497] ACPI Error: Method
> parse/execution failed \_SB.PCI0.VGA.ATC0, AE_NOT_FOUND
> (20170531/psparse-550) [ 24.238528] ACPI Error: Method
> parse/execution failed \_SB.PCI0.VGA.ATCS, AE_NOT_FOUND
> (20170531/psparse-550) [ 24.238558] [drm] UVD is enabled in physical
> mode [ 24.238561] [drm] VCE enabled in physical mode [ 24.250365]
> ATOM BIOS: 109-C95010-001 [ 24.250381] [drm] GPU post is not needed
> [ 24.250407] [drm] vm size is 64 GB, block size is 13-bit, fragment
> size is 9-bit [ 24.250412] amdgpu 0000:00:01.0: VRAM: 512M
> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used) [ 24.250413]
> amdgpu 0000:00:01.0: GTT: 1024M 0x0000000000000000 -
> 0x000000003FFFFFFF [ 24.250420] [drm] Detected VRAM RAM=512M,
> BAR=512M [ 24.250421] [drm] RAM width 64bits UNKNOWN [ 24.250795]
> [TTM] Zone kernel: Available graphics memory: 3846244 kiB [
> 24.250797] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
> [ 24.250797] [TTM] Initializing pool allocator [ 24.250801] [TTM]
> Initializing DMA pool allocator [ 24.250844] [drm] amdgpu: 512M of
> VRAM memory ready [ 24.250845] [drm] amdgpu: 3072M of GTT memory
> ready.
> [ 24.250860] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [ 24.250970] [drm] PCIE GART of 1024M enabled (table at
> 0x000000F400040000).
> [ 24.251017] amdgpu 0000:00:01.0: amdgpu: using MSI.
> [ 24.251034] [drm] amdgpu: irq initialized.
> [ 24.251037] amdgpu: [powerplay] amdgpu: powerplay sw initialized [
> 24.254140] [drm] Chained IB support enabled!
> [ 24.257056] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu
> addr 0x0000000000400080, cpu addr 0xffffc9000105d080 [ 24.257196]
> amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr
> 0x0000000000400100, cpu addr 0xffffc9000105d100 [ 24.257922] amdgpu
> 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000000400180,
> cpu addr 0xffffc9000105d180 [ 24.258053] amdgpu 0000:00:01.0: fence
> driver on ring 3 use gpu addr 0x0000000000400200, cpu addr
> 0xffffc9000105d200 [ 24.258115] amdgpu 0000:00:01.0: fence driver on
> ring 4 use gpu addr 0x0000000000400280, cpu addr 0xffffc9000105d280 [
> 24.258146] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr
> 0x0000000000400300, cpu addr 0xffffc9000105d300 [ 24.258353] amdgpu
> 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000000400380,
> cpu addr 0xffffc9000105d380 [ 24.258426] amdgpu 0000:00:01.0: fence
> driver on ring 7 use gpu addr 0x0000000000400400, cpu addr
> 0xffffc9000105d400 [ 24.258484] amdgpu 0000:00:01.0: fence driver on
> ring 8 use gpu addr 0x0000000000400480, cpu addr 0xffffc9000105d480 [
> 24.258528] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr
> 0x0000000000400520, cpu addr 0xffffc9000105d520 [ 24.260159] amdgpu
> 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000004005a0,
> cpu addr 0xffffc9000105d5a0 [ 24.260508] amdgpu 0000:00:01.0: fence
> driver on ring 11 use gpu addr 0x0000000000400620, cpu addr
> 0xffffc9000105d620 [ 24.261591] [drm] Found UVD firmware Version:
> 1.91 Family ID: 11 [ 24.262451] amdgpu 0000:00:01.0: fence driver on
> ring 12 use gpu addr 0x000000f400296560, cpu addr 0xffffc90003442560 [
> 24.263350] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [
> 24.263819] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr
> 0x0000000000400720, cpu addr 0xffffc9000105d720 [ 24.263921] amdgpu
> 0000:00:01.0: fence driver on ring 14 use gpu addr 0x00000000004007a0,
> cpu addr 0xffffc9000105d7a0 [ 24.264438] amdgpu: [powerplay] Fail to
> get clock table from SMU!
> [ 24.264440] amdgpu: [powerplay] amdgpu: powerplay initialization
> failed [ 24.264467] [drm] DAL is enabled [ 24.264835] [drm] DC:
> create_links: connectors_num: physical:3,
> virtual:0
> [ 24.264839] [drm] Connector[0] description:signal 32 [ 24.264842]
> [drm] Using channel: CHANNEL_ID_DDC1 [1] [ 24.264851] [drm]
> Connector[1] description:signal 4 [ 24.264853] [drm] Using channel:
> CHANNEL_ID_DDC2 [2] [ 24.264860] [drm] Connector[2]
> description:signal 4 [ 24.264862] [drm] Using channel:
> CHANNEL_ID_DDC3 [3] [ 24.564284] [drm:hwss_wait_for_blank_complete
> [amdgpu]] *ERROR* DC:
> failed to blank crtc!
> [ 24.564329] [drm] Display Core initialized [ 24.564332] [drm]
> amdgpu: freesync_module init done ffff88021048afe0.
> [ 24.564564] [drm] link=0, dc_sink_in= (null) is now
> Disconnected [ 24.564565] [drm] DCHPD: connector_id=0: dc_sink
> didn't change.
> [ 24.564624] [drm] link=1, dc_sink_in= (null) is now
> Disconnected [ 24.564624] [drm] DCHPD: connector_id=1: dc_sink
> didn't change.
> [ 24.564738] [drm] link=2, dc_sink_in= (null) is now
> Disconnected [ 24.564739] [drm] DCHPD: connector_id=2: dc_sink
> didn't change.
> [ 24.564751] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 24.564752] [drm] Driver supports precise vblank timestamp query.
> [ 24.564752] [drm] KMS initialized.
> [ 24.566110] [drm] ring test on 0 succeeded in 13 usecs [
> 24.755765] [drm:gfx_v8_0_kiq_resume [amdgpu]] *ERROR* KCQ enable
> failed (scratch(0xC040)=0xCAFEDEAD) [ 24.755819]
> [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block
> <gfx_v8_0> failed -22 [ 24.755839] amdgpu 0000:00:01.0: amdgpu_init
> failed [ 24.756271] BUG: unable to handle kernel NULL pointer
> dereference at
> (null)
> [ 24.756302] IP: (null)
> [ 24.756312] PGD 2134b3067
> [ 24.756312] P4D 2134b3067
> [ 24.756320] PUD 0
>
> [ 24.756340] Oops: 0010 [#1] SMP
> [ 24.756349] Modules linked in: amdgpu(+) chash ttm ax88179_178a
> usbnet xhci_pci xhci_hcd efivarfs [ 24.756380] CPU: 3 PID: 3021
> Comm: modprobe Not tainted 4.13.0-rc5+ #33 [ 24.756396] Hardware
> name: AMD Myrtle/Myrtle, BIOS TMY1100A 03/23/2016 [ 24.756413] task:
> ffff8802132744c0 task.stack: ffffc90000fd0000 [ 24.756427] RIP:
> 0010: (null) [ 24.756437] RSP: 0018:ffffc90000fd3908
> EFLAGS: 00010202 [ 24.756450] RAX: ffff88021048a460 RBX:
> ffff8802100258a0 RCX:
> 000000018020000d
> [ 24.756466] RDX: 000000018020000e RSI: 0000000000005c02 RDI:
> ffff88021048a5a0
> [ 24.756482] RBP: ffffc90000fd3928 R08: ffff880210f9e580 R09:
> 000000018020000d
> [ 24.756499] R10: ffffc90000fd3948 R11: ffffea0008525e00 R12:
> 0000000000005c02
> [ 24.756516] R13: ffff88021365b690 R14: ffff880211db0040 R15:
> ffff880211db2f30
> [ 24.756534] FS: 00007ffa8be38700(0000) GS:ffff88021ed80000(0000)
> knlGS:0000000000000000
> [ 24.756554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 24.756569] CR2: 0000000000000000 CR3: 0000000210030000 CR4:
> 00000000001406e0
> [ 24.756586] Call Trace:
> [ 24.756745] ? destroy+0x31/0x100 [amdgpu] [ 24.756822]
> dal_i2caux_destruct+0x5d/0x90 [amdgpu] [ 24.756875]
> destroy+0x15/0x30 [amdgpu] [ 24.756925]
> dal_i2caux_destroy+0x1b/0x30 [amdgpu] [ 24.756977]
> destruct+0x90/0x140 [amdgpu] [ 24.757028] dc_destroy+0x10/0x30
> [amdgpu] [ 24.757083] amdgpu_dm_fini+0x62/0x70 [amdgpu] [
> 24.757137] dm_hw_fini+0x1d/0x30 [amdgpu] [ 24.757183]
> amdgpu_fini+0xe8/0x330 [amdgpu] [ 24.757229]
> amdgpu_device_init+0xe5a/0x1560 [amdgpu] [ 24.757245] ?
> kmalloc_order_trace+0x29/0xd0 [ 24.757290] ?
> amdgpu_driver_load_kms+0x53/0x200 [amdgpu] [ 24.757338]
> amdgpu_driver_load_kms+0x78/0x200 [amdgpu] [ 24.757353]
> drm_dev_register+0x141/0x1d0 [ 24.757393]
> amdgpu_pci_probe+0x113/0x140 [amdgpu] [ 24.757406]
> local_pci_probe+0x40/0xa0 [ 24.757416] pci_device_probe+0xaa/0x130
> [ 24.757426] driver_probe_device+0x23e/0x2d0 [ 24.757437]
> __driver_attach+0x96/0xa0 [ 24.757446] ?
> driver_probe_device+0x2d0/0x2d0 [ 24.757457]
> bus_for_each_dev+0x5b/0x90 [ 24.757467] driver_attach+0x19/0x20 [
> 24.757476] bus_add_driver+0x11c/0x220 [ 24.757485]
> driver_register+0x5b/0xd0 [ 24.757495]
> __pci_register_driver+0x47/0x50 [ 24.757532] amdgpu_init+0x88/0x9b
> [amdgpu] [ 24.757544] ? 0xffffffffa030a000 [ 24.757554]
> do_one_initcall+0x3e/0x160 [ 24.757566] ? __vunmap+0x7c/0xb0 [
> 24.757577] ? kfree+0x147/0x160 [ 24.757587] ?
> kmem_cache_alloc_trace+0x33/0x150 [ 24.757602]
> do_init_module+0x5a/0x1f1 [ 24.757614] load_module+0x2329/0x28d0 [
> 24.758259] ? kernel_read_file+0x19e/0x1c0 [ 24.758898]
> SYSC_finit_module+0xba/0xc0 [ 24.759524] ?
> SYSC_finit_module+0xba/0xc0 [ 24.760206] SyS_finit_module+0x9/0x10
> [ 24.760835] entry_SYSCALL_64_fastpath+0x13/0x94
> [ 24.761450] RIP: 0033:0x7ffa8b310219 [ 24.762137] RSP:
> 002b:00007ffe64b86b18 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 24.762851] RAX: ffffffffffffffda RBX: 00000055ee325090 RCX:
> 00007ffa8b310219
> [ 24.763487] RDX: 0000000000000000 RSI: 00000055edf2d2a6 RDI:
> 0000000000000005
> [ 24.764116] RBP: 00000055ee326f50 R08: 0000000000000000 R09:
> 0000000000000000
> [ 24.764716] R10: 0000000000000005 R11: 0000000000000246 R12:
> 00000055ee3252f0
> [ 24.765298] R13: 00007ffe64b86ad8 R14: 00007ffe64b86ae0 R15:
> 0000000000000000
> [ 24.765878] Code: Bad RIP value.
> [ 24.766464] RIP: (null) RSP: ffffc90000fd3908 [
> 24.767036] CR2: 0000000000000000 [ 24.767717] ---[ end trace
> 636f871b29b747e7 ]--- _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list