[Bug 210321] /display/dc/dcn20/dcn20_resource.c:3240 dcn20_validate_bandwidth_fp+0x8b/0xd0 [amdgpu]

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Fri Mar 12 12:31:15 UTC 2021


https://bugzilla.kernel.org/show_bug.cgi?id=210321

Tristen Hayfield (tristen.hayfield at gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tristen.hayfield at gmail.com

--- Comment #4 from Tristen Hayfield (tristen.hayfield at gmail.com) ---
I'm seeing this on the 5.10.* series as well. Currently 5.10.23, Gentoo. Radeon
RX 5500 XT.

Looking at the offending section of code, it seems an assertion is being
triggered:

        // Fallback: Try to only support G6 temperature read latency
        context->bw_ctx.dml.soc.dram_clock_change_latency_us =
context->bw_ctx.dml.soc.dummy_pstate_latency_us;

        voltage_supported = dcn20_validate_bandwidth_internal(dc, context,
false);
        dummy_pstate_supported =
context->bw_ctx.bw.dcn.clk.p_state_change_support;

        if (voltage_supported && dummy_pstate_supported) {
                context->bw_ctx.bw.dcn.clk.p_state_change_support = false;
                goto restore_dml_state;
        }

        // ERROR: fallback is supposed to always work.
        ASSERT(false);

So one of (or both) voltage_supported and dummy_pstate_supported are evaluating
to false here and falling through to the assertions.

Stack trace attached for completeness' sake. Hopefully a dev that understands
the hardware will take a look at this one day and find it helpful.

[  642.193449] WARNING: CPU: 22 PID: 3546 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:3242
dcn20_validate_bandwidth_fp+0xd3/0xf0 [amdgpu]
[  642.193450] Modules linked in: fuse nfs lockd grace nfs_ssc sunrpc k10temp
amdgpu backlight gpu_sched snd_hda_codec_hdmi ttm iwlmvm iwlwifi acpi_cpufreq
efivarfs
[  642.193457] CPU: 22 PID: 3546 Comm: X Not tainted 5.10.23-gentoo #1
[  642.193457] Hardware name: System manufacturer System Product Name/TUF
GAMING X570-PLUS (WI-FI), BIOS 3402 01/13/2021
[  642.193487] RIP: 0010:dcn20_validate_bandwidth_fp+0xd3/0xf0 [amdgpu]
[  642.193488] Code: 5d 41 5c c3 5b 48 89 ee 4c 89 e7 5d ba 01 00 00 00 41 5c
e9 2f f6 ff ff 41 0f b6 f4 48 c7 c7 a0 a8 8c c0 31 c0 e8 8d 09 14 d9 <0f> 0b 48
89 9d 50 26 00 00 44 89 e0 5b 5d 41 5c c3 0f 0b e9 53 ff
[  642.193489] RSP: 0018:ffffc18284b37b40 EFLAGS: 00010246
[  642.193490] RAX: 0000000000000000 RBX: 4079400000000000 RCX:
0000000000000000
[  642.193490] RDX: 0000000000000000 RSI: ffff9ea22f197380 RDI:
ffff9ea22f197380
[  642.193491] RBP: ffff9e93ab0e0000 R08: 0000000000000000 R09:
ffffc18284b37910
[  642.193492] R10: ffffc18284b37908 R11: ffffffff9a722228 R12:
0000000000000001
[  642.193492] R13: 0000000000000000 R14: ffff9e93ab0e0000 R15:
ffff9e9344e5b560
[  642.193493] FS:  00007fc5f22978c0(0000) GS:ffff9ea22f180000(0000)
knlGS:0000000000000000
[  642.193494] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  642.193494] CR2: 000055fa36a96628 CR3: 00000001155b2000 CR4:
0000000000350ee0
[  642.193495] Call Trace:
[  642.193526]  dcn20_validate_bandwidth+0x24/0x40 [amdgpu]
[  642.193548]  dc_validate_global_state+0x284/0x300 [amdgpu]
[  642.193580]  amdgpu_dm_atomic_check+0xb09/0xc00 [amdgpu]
[  642.193584]  drm_atomic_check_only+0x555/0x7d0
[  642.193585]  drm_atomic_commit+0xe/0x50
[  642.193586]  drm_atomic_connector_commit_dpms+0xd5/0xf0
[  642.193588]  drm_mode_obj_set_property_ioctl+0x184/0x3a0
[  642.193589]  ? drm_connector_set_obj_prop+0x80/0x80
[  642.193590]  drm_connector_property_set_ioctl+0x32/0x50
[  642.193592]  drm_ioctl_kernel+0xa5/0xf0
[  642.193593]  drm_ioctl+0x20a/0x3a0
[  642.193594]  ? drm_connector_set_obj_prop+0x80/0x80
[  642.193614]  amdgpu_drm_ioctl+0x44/0x80 [amdgpu]
[  642.193616]  __x64_sys_ioctl+0x81/0xa0
[  642.193618]  do_syscall_64+0x33/0x80
[  642.193620]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  642.193621] RIP: 0033:0x7fc5f24cb227
[  642.193622] Code: 1f 40 00 48 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b1 e8
0c ff ff ff 85 c0 78 b6 5b 4c 89 e0 5d 41 5c c3 b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 11 6c 0c 00 f7 d8 64 89 01 48
[  642.193622] RSP: 002b:00007fff122b98c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  642.193623] RAX: ffffffffffffffda RBX: 00007fff122b9900 RCX:
00007fc5f24cb227
[  642.193624] RDX: 00007fff122b9900 RSI: 00000000c01064ab RDI:
000000000000000c
[  642.193624] RBP: 00000000c01064ab R08: 0000000000000000 R09:
00007fc5f2b97d10
[  642.193625] R10: 00007fc5f2b97d20 R11: 0000000000000246 R12:
000055fa38755350
[  642.193625] R13: 000000000000000c R14: 0000000000000000 R15:
000055fa36abf540
[  642.193626] ---[ end trace b1edc8bf2eac897c ]---


I added the following line before the assertion and recompiled the kernel:
DC_LOG_ERROR("voltage_supported: %d, dummy_pstate_supported: %d\n",
voltage_supported, dummy_pstate_supported);

When the issue triggered again, it logged:
[drm:dcn20_validate_bandwidth_fp [amdgpu]] *ERROR* voltage_supported: 1,
dummy_pstate_supported: 0

So in my case the assertion is being triggered because dummy_pstate_supported
is false and the fallback is not working as intended.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list