[PATCH 1/2] drm/amd/display: Protect dml2_create()/dml2_copy()/dml2_create_copy()
Huacai Chen
chenhuacai at kernel.org
Sat Mar 29 08:47:09 UTC 2025
Hi, Aurabindo,
On Sat, Mar 29, 2025 at 2:27 AM Aurabindo Pillai
<aurabindo.pillai at amd.com> wrote:
>
>
>
> On 2025-03-26 21:40, Huacai Chen wrote:
> > Hi, Alex,
> >
> > On Thu, Mar 27, 2025 at 8:10 AM Alex Hung <alex.hung at amd.com> wrote:
> >>
> >> The following error messages showed up on an APU and a dGPU during testing.
> >>
> >> <3> [100.231411] BUG: sleeping function called from invalid context at
> >> include/linux/sched/mm.h:321
> >> <3> [100.231414] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
> >> 1711, name: kms_color
> >> <3> [100.231416] preempt_count: 2, expected: 0
> >> <3> [100.231417] RCU nest depth: 0, expected: 0
> >> <3> [100.231418] Preemption disabled at:
> >> <3> [100.231419] [<ffffffffc0c2843b>] dc_fpu_begin+0x2b/0xc0 [amdgpu]
> >> <4> [100.231626] CPU: 4 UID: 0 PID: 1711 Comm: kms_color Tainted: G
> >> W 6.12.0+ #1
> >> <4> [100.231629] Tainted: [W]=WARN
> >> <4> [100.231631] Call Trace:
> >> <4> [100.231632] <TASK>
> >> <4> [100.231633] dump_stack_lvl+0x5b/0x70
> >> <4> [100.231638] dump_stack+0x10/0x20
> >> <4> [100.231639] __might_resched+0x170/0x1d0
> >> <4> [100.231643] __might_sleep+0x44/0x70
> >> <4> [100.231645] __alloc_pages_noprof+0x22f/0x370
> >> <4> [100.231649] ___kmalloc_large_node+0x95/0x150
> >> <4> [100.231651] ? preempt_count_add+0x4e/0xc0
> >> <4> [100.231653] __kmalloc_large_noprof+0x1d/0xb0
> >> <4> [100.231655] dml2_create_copy+0x27/0x60 [amdgpu]
> >> <4> [100.231827] dc_state_create_copy+0x7e/0x170 [amdgpu]
> >> <4> [100.231995] update_planes_and_stream_state+0x23c/0x600 [amdgpu]
> >> <4> [100.232189] update_planes_and_stream_v2+0x22b/0x530 [amdgpu]
> >> <4> [100.232366] ? amdgpu_dm_atomic_commit_tail+0x1310/0x4100 [amdgpu]
> >> <4> [100.232569] ? commit_tail+0x96/0x140 [drm_kms_helper]
> >> <4> [100.232577] dc_update_planes_and_stream+0x5b/0xe0 [amdgpu]
> >> <4> [100.232730] amdgpu_dm_atomic_commit_tail+0x1fa7/0x4100 [amdgpu]
> >> <4> [100.232908] ? stack_depot_save_flags+0x2c/0x730
> >> <4> [100.232915] ? wait_for_completion_timeout+0x1d/0x30
> >> <4> [100.232917] commit_tail+0x96/0x140 [drm_kms_helper]
> >> <4> [100.232923] drm_atomic_helper_commit+0x12b/0x150 [drm_kms_helper]
> >> <4> [100.232927] drm_atomic_commit+0xad/0xe0 [drm]
> >> <4> [100.232939] ? __pfx___drm_printfn_info+0x10/0x10 [drm]
> >> <4> [100.232956] drm_atomic_helper_set_config+0x80/0xc0 [drm_kms_helper]
> >> <4> [100.232961] drm_mode_setcrtc+0x22e/0x910 [drm]
> >> <4> [100.232975] ? kfree+0x18f/0x350
> >> <4> [100.232977] ? __pfx_drm_mode_setcrtc+0x10/0x10 [drm]
> >> <4> [100.232987] drm_ioctl_kernel+0xa7/0x100 [drm]
> >> <4> [100.233004] drm_ioctl+0x29d/0x500 [drm]
> >> <4> [100.233015] ? __pfx_drm_mode_setcrtc+0x10/0x10 [drm]
> >> <4> [100.233026] ? _raw_spin_unlock_irqrestore+0x1f/0x40
> >> <4> [100.233029] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
> >> <4> [100.233131] __x64_sys_ioctl+0x92/0xd0
> >> <4> [100.233133] x64_sys_call+0x1205/0x20d0
> >> <4> [100.233136] do_syscall_64+0x50/0x110
> >> <4> [100.233138] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >> <4> [100.233142] RIP: 0033:0x7fb21e71a94f
> >> <4> [100.233144] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> >> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
> >> 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> >> <4> [100.233145] RSP: 002b:00007ffdd9a52e50 EFLAGS: 00000246 ORIG_RAX:
> >> 0000000000000010
> >> <4> [100.233148] RAX: ffffffffffffffda RBX: 00007ffdd9a52ee0 RCX:
> >> 00007fb21e71a94f
> >> <4> [100.233149] RDX: 00007ffdd9a52ee0 RSI: 00000000c06864a2 RDI:
> >> 0000000000000005
> >> <4> [100.233149] RBP: 00000000c06864a2 R08: 0000000000000000 R09:
> >> 00005609537f7b08
> >> <4> [100.233150] R10: 0000000000000000 R11: 0000000000000246 R12:
> >> 0000000000000000
> >> <4> [100.233151] R13: 0000000000000005 R14: 0000000000000000 R15:
> >> 00005609537e2848
> >> <4> [100.233152] </TASK>
> > This seems caused by dml2_allocate_memory(), to fix this we can only
> > protect FPU in DML2, I can do it in the new version, but I want to
> > listen Aurabindo's opinion.
> >
> >
>
> It looks like dml21_apply_soc_bb_overrides() does have some division on
> double variables. I'm curious why we dont see this on our side. Was this
> seen on x86 or Loongson?
It is seen on Loongson.
>
> I think your approach is correct. Thanks for taking time to fix this. We
> can add it to weekly testing if you send us a patch.
V2 is sent, please take a look.
https://lore.kernel.org/dri-devel/20250327095334.3327111-1-chenhuacai@loongson.cn/T/#t
Huacai
More information about the dri-devel
mailing list