[PATCH] drm/atmel-hlcdc: Release CRTC commit when destroying plane state
Ludovic.Desroches at microchip.com
Ludovic.Desroches at microchip.com
Fri Mar 8 08:00:03 UTC 2024
On 3/6/24 20:49, Pierre-Louis Dourneau wrote:
>
> From: Arnaud Lahache <a.lahache at klervi.com>
>
> Fixes a memory leak occurring on each modeset update.
>
> Running a program such as libdrm's modetest[0] with this driver exhausts
> all available memory after a few minutes. Enabling kmemleak yields a series
> of such leak reports:
>
> unreferenced object 0xc21acf40 (size 64):
> comm "modetest", pid 210, jiffies 4294942460 (age 331.240s)
> hex dump (first 32 bytes):
> 00 a0 a1 c1 01 00 00 00 ff ff ff ff 4c cf 1a c2 ............L...
> 4c cf 1a c2 ff ff ff ff 58 cf 1a c2 58 cf 1a c2 L.......X...X...
> backtrace:
> [<d68b3e09>] kmalloc_trace+0x18/0x24
> [<f858a020>] drm_atomic_helper_setup_commit+0x1e0/0x7e0
> [<26e8ab04>] drm_atomic_helper_commit+0x40/0x160
> [<49708b0c>] drm_atomic_commit+0xa8/0xf0
> [<e58c2942>] drm_mode_obj_set_property_ioctl+0x154/0x3d8
> [<5e97e57d>] drm_ioctl+0x200/0x3c4
> [<ed514ba1>] sys_ioctl+0x240/0xb48
> [<26aab344>] ret_fast_syscall+0x0/0x44
>
> drm_atomic_helper_setup_commit() acquires a reference to a drm_crtc_commit
> for each CRTC and associated connectors and planes involved in a modeset.
> 64-byte leaks map well to the size of a drm_crtc_commit on 32-bit
> architectures, and the second 4-byte chunk in the hex dump above awfully
> looks like a reference count.
>
> We tracked this missing reference decrement down to the driver's
> atmel_hlcdc_plane_atomic_destroy_state() callback. Its CRTC counterpart,
> atmel_hlcdc_crtc_destroy_state(), calls into the drm_atomic helpers and
> properly releases its references to the commit. Planes didn't. Using the
> default helper for that purpose, __drm_atomic_helper_plane_destroy_state(),
> fixes the leak and avoids reimplementing the same logic in the driver.
>
> [0]: https://gitlab.freedesktop.org/mesa/drm/-/tree/main/tests/modetest
> Invoke with `modetest -M atmel-hlcdc -s 32:#0 -v`, assuming 32 is the
> ID of a connector.
>
> Signed-off-by: Arnaud Lahache <a.lahache at klervi.com>
> Co-developed-by: Pierre-Louis Dourneau <pl.dourneau at klervi.com>
> Signed-off-by: Pierre-Louis Dourneau <pl.dourneau at klervi.com>
> Co-developed-by: Benoît Alcaina <b.alcaina at klervi.com>
> Signed-off-by: Benoît Alcaina <b.alcaina at klervi.com>
> ---
> As far as our testing goes, we've been running 6 of our production units
> with this patch for more than 2 weeks as per the date this patch is sent
> out. We can report stable memory usage.
Hello Arnaud,
This patch fixes the memory leak but introduces a crash on my side when
exiting a graphics app using the Microchip graphics library.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 201 at lib/refcount.c:28
refcount_warn_saturate+0x58/0x130
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 201 Comm: basic Not tainted 6.1.55-linux4microchip-2023.10+
#65
Hardware name: Microchip SAM9X60
unwind_backtrace from show_stack+0x10/0x18
show_stack from dump_stack_lvl+0x28/0x34
dump_stack_lvl from __warn+0x70/0xb8
__warn from warn_slowpath_fmt+0x78/0xac
warn_slowpath_fmt from refcount_warn_saturate+0x58/0x130
refcount_warn_saturate from kref_put+0x54/0x5c
kref_put from drm_fb_release+0x100/0x11c
drm_fb_release from drm_file_free+0xcc/0x1bc
drm_file_free from drm_release+0x44/0x94
drm_release from __fput+0xf0/0x20c
__fput from task_work_run+0x8c/0xb0
task_work_run from do_exit+0x310/0x760
do_exit from sys_exit_group+0x0/0x14
sys_exit_group from get_signal+0x524/0x64c
get_signal from do_work_pending+0xf4/0x398
do_work_pending from slow_work_pending+0xc/0x24
Exception stack(0xc9991fb0 to 0xc9991ff8)
1fa0: 0054dd48 00000000 005331f8
00000001
1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
0051b424
1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff
---[ end trace 0000000000000000 ]---
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address
00000004
[00000004] *pgd=00000000
Internal error: Oops: 805 [#1] ARM
Modules linked in:
CPU: 0 PID: 201 Comm: basic Tainted: G W
6.1.55-linux4microchip-2023.10+ #65
Hardware name: Microchip SAM9X60
PC is at drm_fb_release+0xc0/0x11c
LR is at drm_fb_release+0x100/0x11c
pc : [<c0329b10>] lr : [<c0329b50>] psr: 80000013
sp : c9991e48 ip : 00000000 fp : 0051b424
r10: c174a5f0 r9 : 00000000 r8 : c2188074
r7 : 60000013 r6 : c9991e5c r5 : ffffff8c r4 : c2188048
r3 : c1590994 r2 : c2188048 r1 : 00000000 r0 : c1590920
Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 0005317f Table: 2384c000 DAC: 00000051
Register r0 information: slab kmalloc-192 start c1590920 pointer offset
0 size 192
Register r1 information: NULL pointer
Register r2 information: slab kmalloc-192 start c2188000 pointer offset
72 size 192
Register r3 information: slab kmalloc-192 start c1590920 pointer offset
116 size 192
Register r4 information: slab kmalloc-192 start c2188000 pointer offset
72 size 192
Register r5 information: non-paged memory
Register r6 information: 2-page vmalloc region starting at 0xc9990000
allocated at kernel_clone+0xb4/0x204
Register r7 information: non-paged memory
Register r8 information: slab kmalloc-192 start c2188000 pointer offset
116 size 192
Register r9 information: NULL pointer
Register r10 information: slab task_struct start c174a140 pointer offset
1200
Register r11 information: non-paged memory
Register r12 information: NULL pointer
Process basic (pid: 201, stack limit = 0x41541c7b)
Stack: (0xc9991e48 to 0xc9992000)
1e40: 00000001 00000000 00000000 00000000 00000000
c9991e5c
1e60: c9991e5c 5c44a9dd c2188000 c1621800 c2188060 c03146e4 000000c9
0000e200
1e80: 00000001 00000000 00000000 c1621800 c38f2cc0 c0fb63e0 c0c1ce70
c0f99560
1ea0: 00000000 c0314a5c c38f2cc0 c10ee6e8 000a201f c00c66a0 c38f2cc0
00000001
1ec0: c09f1a4c c00c67fc c38f2e00 c174a140 c0a20b1c c2232a50 c9991ef4
c002f000
1ee0: c174a140 c2232a20 c174a140 c001a094 00000002 00000008 c383c880
5c44a9dd
1f00: 00000002 c383c880 0051b424 c001a6a8 00000009 c0024af0 beb53c50
00000000
1f20: 00000000 5c44a9dd 00000000 00000000 c9991fb0 c174a140 00000000
00000000
1f40: 00000000 00000000 0051b424 c000c244 00000000 00000000 00000000
00000000
1f60: 00000000 00000000 00000009 00000000 00000000 00000000 00000000
00000000
1f80: 00000000 00000000 00000000 5c44a9dd b68357dc 20000010 ffffffff
0005317f
1fa0: 00000000 c174a140 43ab8000 c00084dc 0054dd48 00000000 005331f8
00000001
1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
0051b424
1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff 00000000
00000000
drm_fb_release from drm_file_free+0xcc/0x1bc
drm_file_free from drm_release+0x44/0x94
drm_release from __fput+0xf0/0x20c
__fput from task_work_run+0x8c/0xb0
task_work_run from do_exit+0x310/0x760
do_exit from sys_exit_group+0x0/0x14
sys_exit_group from get_signal+0x524/0x64c
get_signal from do_work_pending+0xf4/0x398
do_work_pending from slow_work_pending+0xc/0x24
Exception stack(0xc9991fb0 to 0xc9991ff8)
1fa0: 0054dd48 00000000 005331f8
00000001
1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000
0051b424
1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff
Code: e590c018 e5902078 e5901074 e35c0001 (e5812004)
---[ end trace 00000000c0a20288 ]---
Fixing recursive fault but reboot is needed!
The memory leak had been introduced with this commit:
commit eec44d44a3d2d00c8f663f13555d3dd280b1ea3f
Author: Daniel Vetter <daniel.vetter at ffwll.ch>
Date: Thu Jan 21 16:29:54 2021 +0100
drm/atmel: Use drm_atomic_helper_commit
One of these drivers that predates the nonblocking support in helpers,
and hand-rolled its own thing. Entirely not anything specific here, we
can just delete it all and replace it with the helper version.
Could also perhaps use the drm_mode_config_helper_suspend/resume
stuff, for another few lines deleted. But I'm not looking at that
stuff, I'm just going through all the atomic commit functions and make
sure they have properly annotated dma-fence critical sections
everywhere.
I think this move to the drm atomic helper should have gone with an
update on the release side as well. There is probably something wrong
with the atomic_destroy_state callbacks provided by the atmel-hlcdc driver.
Regards,
Ludovic
>
> Admittedly, our usage of the DRM uAPI is rather simple: create 2 dumb
> buffers, do an initial MODE_SETCRTC, and then MODE_PAGE_FLIP between the
> two dumb buffers at the rate of once per second on average. We haven't
> evaluated more complex workloads.
>
> drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> index daa508504f47..390c4fc62af7 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
> @@ -934,8 +934,7 @@ static void atmel_hlcdc_plane_atomic_destroy_state(struct drm_plane *p,
> state->dscrs[i]->self);
> }
>
> - if (s->fb)
> - drm_framebuffer_put(s->fb);
> + __drm_atomic_helper_plane_destroy_state(s);
>
> kfree(state);
> }
>
> base-commit: 9dfc46c87cdc8f5a42a71de247a744a6b8188980
> --
> 2.34.1
>
More information about the dri-devel
mailing list