<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left">
[AMD Official Use Only - AMD Internal Distribution Only]<br>
</p>
<br>
<div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Mike,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt;">
<span style="color: rgb(0, 0, 0);">Thanks for the details. We tried to repro the issue at our end on 9000 and 7000 series dgpu, but we're not seeing the dmub errors. We were on Ubunti, so we'll try Fedora.</span></div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature" class="elementToProof" style="color: inherit;">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
--</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Regards,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Jay<br>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com><br>
<b>Sent:</b> Wednesday, May 21, 2025 12:40 PM<br>
<b>To:</b> Pillai, Aurabindo <Aurabindo.Pillai@amd.com><br>
<b>Cc:</b> Chung, ChiaHsuan (Tom) <ChiaHsuan.Chung@amd.com>; Wu, Ray <Ray.Wu@amd.com>; Wheeler, Daniel <Daniel.Wheeler@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>; dri-devel <dri-devel@lists.freedesktop.org>;
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>; Linux regressions mailing list <regressions@lists.linux.dev><br>
<b>Subject:</b> Re: 6.15-rc6/regression/bisected - after commit f1c6be3999d2 error appeared: *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">On Tue, May 20, 2025 at 9:22 PM Mikhail Gavrilov<br>
<mikhail.v.gavrilov@gmail.com> wrote:<br>
><br>
> > Could you more details about your setup, and how you were able to repro it ?<br>
> ><br>
<br>
Hi,<br>
Were you able to reproduce the issue?<br>
<br>
I’ve prepared a step-by-step guide that may help:<br>
1. Set up a system with a Radeon 6900XT and an LG TV connected via HDMI.<br>
2. Install Fedora Rawhide.<br>
3. Build and install kernel 6.15-rc7 using my .config (attached in the<br>
first message).<br>
4. Boot into the custom-built kernel.<br>
5. Set the display resolution to 3840×2160 @ 120 Hz.<br>
(This step is optional but may help trigger the issue faster.)<br>
6. Generate heavy system load. I use an infinite kernel rebuild loop:<br>
<fish shell><br>
> for i in (seq 1 400000); make clean && make -j32 bzImage && make -j32 modules; end<br>
</fish shell><br>
<br>
Expected behavior:<br>
System remains stable during heavy load.<br>
<br>
Actual behavior:<br>
1. First, the kernel log is filled with repeated messages:<br>
amdgpu 0000:03:00.0: amdgpu: [drm] DP AUX transfer fail:4<br>
2. After a short while under load, more severe errors appear:<br>
amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data:<br>
DMCUB error - collecting diagnostic data<br>
3. Finally, the system completely freezes with a hard lockup:<br>
watchdog: CPU28: Watchdog detected hard LOCKUP on cpu 28<br>
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_queue<br>
nfnetlink_queue nf_conntrack_netbios_ns nf_conntrack_broadcast<br>
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet<br>
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat<br>
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep<br>
sunrpc binfmt_misc amd_atl intel_rapl_msr intel_rapl_common<br>
edac_mce_amd btusb snd_hda_codec_realtek btrtl mt7921e btintel<br>
mt7921_common snd_hda_codec_generic btbcm mt792x_lib btmtk<br>
snd_hda_scodec_component snd_hda_codec_hdmi snd_usb_audio<br>
mt76_connac_lib kvm_amd snd_hda_intel bluetooth mt76 snd_intel_dspcfg<br>
snd_usbmidi_lib snd_intel_sdw_acpi snd_hda_codec mc kvm spd5118<br>
mac80211 snd_ump snd_hda_core snd_rawmidi snd_hwdep vfat irqbypass fat<br>
snd_seq snd_seq_device wmi_bmof libarc4 rapl r8169 pcspkr snd_pcm<br>
cfg80211 i2c_piix4 snd_timer k10temp i2c_smbus realtek snd rfkill<br>
joydev soundcore gpio_amdpt gpio_generic loop nfnetlink zram<br>
lz4hc_compress lz4_compress amdgpu amdxcp<br>
i2c_algo_bit drm_ttm_helper ttm drm_exec nvme polyval_clmulni<br>
gpu_sched polyval_generic ghash_clmulni_intel drm_suballoc_helper<br>
ucsi_ccg nvme_core sha512_ssse3 typec_ucsi drm_panel_backlight_quirks<br>
sha256_ssse3 drm_buddy nvme_keyring typec sha1_ssse3 sp5100_tco<br>
nvme_auth drm_display_helper cec video wmi fuse<br>
irq event stamp: 117172<br>
hardirqs last enabled at (117171): [<ffffffff9e001566>]<br>
asm_common_interrupt+0x26/0x40<br>
hardirqs last disabled at (117172): [<ffffffffa1c00f97>]<br>
irqentry_enter+0x57/0x60<br>
softirqs last enabled at (117144): [<ffffffff9e614919>]<br>
handle_softirqs+0x579/0x840<br>
softirqs last disabled at (117137): [<ffffffff9e614d16>]<br>
__irq_exit_rcu+0x126/0x240<br>
CPU: 28 UID: 1000 PID: 1737394 Comm: as Tainted: G W L<br>
------ --- 6.15.0-0.rc6.250515g088d13246a46.54.fc43.x86_64+debug #1<br>
PREEMPT(lazy)<br>
Tainted: [W]=WARN, [L]=SOFTLOCKUP<br>
Hardware name: ASRock B650I Lightning WiFi/B650I Lightning WiFi, BIOS<br>
3.08 09/18/2024<br>
RIP: 0010:delay_halt_mwaitx+0x20/0x50<br>
Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 65<br>
48 8b 05 56 13 0d 04 31 d2 48 89 d1 48 05 00 00 ca a5 0f 01 fa <b8> ff<br>
ff ff ff b9 02 00 00 00 48 39 c6 48 0f 47 f0 b8 f0 00 00 00<br>
RSP: 0000:ffffc9003b68f820 EFLAGS: 00000087<br>
RAX: ffff888fda610000 RBX: 000000000000118c RCX: 0000000000000000<br>
RDX: 0000000000000000 RSI: 000000000000118c RDI: 000023b4c02956f6<br>
RBP: 000023b4c02956f6 R08: ffffffffc14b01a9 R09: fffffbfff49570d4<br>
R10: 000000000000001c R11: 0000000000002000 R12: ffffed1040583d43<br>
R13: ffffed1040583d17 R14: 00000000000186a0 R15: ffff888202c1e800<br>
FS: 00007f1da07bcd00(0000) GS:ffff889034970000(0000) knlGS:0000000000000000<br>
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033<br>
CR2: 00007f1da0a43930 CR3: 00000003e594a000 CR4: 0000000000f50ef0<br>
PKRU: 55555554<br>
Call Trace:<br>
<TASK><br>
delay_halt.part.0+0x33/0x60<br>
dmub_srv_wait_for_idle+0x12f/0x1d0 [amdgpu]<br>
dc_dmub_srv_cmd_run_list+0x99/0x2a0 [amdgpu]<br>
dc_dmub_srv_drr_update_cmd+0x158/0x340 [amdgpu]<br>
? __lock_acquire+0x40f/0x1160<br>
? __pfx_dc_dmub_srv_drr_update_cmd+0x10/0x10 [amdgpu]<br>
? lock_acquire.part.0+0xc8/0x270<br>
? local_clock_noinstr+0xf/0x130<br>
optc1_set_drr+0x18b/0xf20 [amdgpu]<br>
? rcu_is_watching+0x15/0xe0<br>
set_drr_and_clear_adjust_pending+0xa6/0x180 [amdgpu]<br>
? __lock_acquire+0x40f/0x1160<br>
dcn10_set_drr+0x224/0x390 [amdgpu]<br>
? __pfx_dcn10_set_drr+0x10/0x10 [amdgpu]<br>
? local_clock+0x15/0x30<br>
? __lock_release.isra.0+0x1cb/0x340<br>
? rcu_is_watching+0x15/0xe0<br>
dc_stream_adjust_vmin_vmax+0x4d9/0xd60 [amdgpu]<br>
? __pfx_dc_stream_adjust_vmin_vmax+0x10/0x10 [amdgpu]<br>
? dm_crtc_high_irq+0x4c8/0xb70 [amdgpu]<br>
? __raw_spin_lock_irqsave+0x60/0x90<br>
dm_crtc_high_irq+0x7b5/0xb70 [amdgpu]<br>
? amdgpu_dm_irq_handler+0xf3/0x2a0 [amdgpu]<br>
amdgpu_dm_irq_handler+0x19a/0x2a0 [amdgpu]<br>
amdgpu_irq_dispatch+0x286/0x670 [amdgpu]<br>
? find_held_lock+0x2b/0x80<br>
? __pfx_amdgpu_irq_dispatch+0x10/0x10 [amdgpu]<br>
? __pfx___drm_dev_dbg+0x10/0x10<br>
? do_raw_spin_unlock+0x59/0x230<br>
? __wake_up+0x44/0x60<br>
amdgpu_ih_process+0x1c4/0x3a0 [amdgpu]<br>
? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]<br>
amdgpu_irq_handler+0x27/0xb0 [amdgpu]<br>
? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]<br>
__handle_irq_event_percpu+0x1b5/0x510<br>
handle_irq_event+0xab/0x1c0<br>
handle_edge_irq+0x213/0xb50<br>
__common_interrupt+0xad/0x1d0<br>
? irq_enter_rcu+0x26/0x190<br>
common_interrupt+0x5a/0xe0<br>
asm_common_interrupt+0x26/0x40<br>
RIP: 0033:0x5639d9e740ee<br>
Code: 45 c8 85 d2 74 04 41 80 08 04 48 83 c4 58 4c 89 c8 5b 41 5c 41<br>
5d 41 5e 41 5f 5d c3 48 8b 57 10 44 8b 15 fd bd 08 00 4c 03 0a <45> 85<br>
d2 0f 84 33 ff ff ff 83 c8 04 4c 89 4f 20 88 07 4c 89 c8 c3<br>
RSP: 002b:00007ffce2334e38 EFLAGS: 00000202<br>
RAX: 0000000000000001 RBX: 00007f1d91572388 RCX: 0000000000000002<br>
RDX: 00007f1d90e9c750 RSI: 0000000000000000 RDI: 00007f1d912e5d20<br>
RBP: 00007ffce2334e40 R08: 00007f1d912e5d20 R09: 000000000000e119<br>
R10: 0000000000000001 R11: 0000000000000002 R12: 00007f1d9157f008<br>
R13: 0000000000000000 R14: 00005639d9e74f90 R15: 00007f1da06fe730<br>
</TASK><br>
INFO: NMI handler (perf_event_nmi_handler) took too long to run: 5.441 msecs<br>
<br>
Environment:<br>
GPU: AMD Radeon 6900XT<br>
Display: LG TV via HDMI<br>
Kernel: 6.15-rc7, built from source using provided config<br>
Distro: Fedora Rawhide<br>
Motherboard: ASRock B650I Lightning WiFi<br>
BIOS: 3.08 (2024-09-18)<br>
<br>
Additional diagnostic info:<br>
Full kernel log ending with stack trace from delay_halt_mwaitx()<br>
Series of dc_dmub_srv_drr_update_cmd() and<br>
dc_stream_adjust_vmin_vmax() calls in call trace<br>
System enters unrecoverable lock state after ~few minutes of heavy compilation<br>
<br>
-- <br>
Best Regards,<br>
Mike Gavrilov.<br>
</div>
</span></font></div>
</div>
</body>
</html>