[Intel-gfx] [PATCH] drm/i915: Fix same object multiple mmap memory leak

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Dec 22 11:06:20 UTC 2022


From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

This is the fix proposed by Chuansheng Liu <chuansheng.liu at intel.com> to
close a memory leak caused by refactoring done in 786555987207
("drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list").

Original commit text from Liu was this:

>
> The below memory leak information is caught:
>
> unreferenced object 0xffff997dd4e3b240 (size 64):
>   comm "gem_tiled_fence", pid 10332, jiffies 4294959326 (age 220778.420s)
>   hex dump (first 32 bytes):
>     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 be f2 d4 7d 99 ff ff  ............}...
>   backtrace:
>     [<ffffffffa0f04365>] kmem_cache_alloc_trace+0x2e5/0x450
>     [<ffffffffc062f3ac>] drm_vma_node_allow+0x2c/0xe0 [drm]
>     [<ffffffffc13149ea>] __assign_mmap_offset_handle+0x1da/0x4a0 [i915]
>     [<ffffffffc1315235>] i915_gem_mmap_offset_ioctl+0x55/0xb0 [i915]
>     [<ffffffffc06207e4>] drm_ioctl_kernel+0xb4/0x140 [drm]
>     [<ffffffffc0620ac7>] drm_ioctl+0x257/0x410 [drm]
>     [<ffffffffa0f553ae>] __x64_sys_ioctl+0x8e/0xc0
>     [<ffffffffa1821128>] do_syscall_64+0x38/0xc0
>     [<ffffffffa1a0007c>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The issue is always reproduced with the test:
> gem_tiled_fence_blits --run-subtest basic
>
> It tries to mmap_gtt the same object several times, it is like:
> create BO
> mmap_gtt BO
> unmap BO
> mmap_gtt BO <== second time mmap_gtt
> unmap
> close BO
>
> The leak happens at the second time mmap_gtt in function
> mmap_offset_attach(),it will simply increase the reference
> count to 2 by calling drm_vma_node_allow() directly since
> the mmo has been created at the first time.
>
> However the driver just revokes the vma_node only one time
> when closing the object, it leads to memory leak easily.
>
> This patch is to fix the memory leak by calling drm_vma_node_allow() one
> time also.

Issue was later also reported by Mirsad:

>
> The problem is a kernel memory leak that is repeatedly occurring
> triggered during the execution of Chrome browser under the latest
> 6.1.0+  kernel of this morning and Almalinux 8.6 on a Lenovo
> desktop box with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.
>
> The build is with KMEMLEAK, KASAN and MGLRU turned on during the
> build,  on a vanilla mainline kernel from Mr. Torvalds' tree.
>
> The leaks look like this one:
>
> unreferenced object 0xffff888131754880 (size 64):
>    comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
>    hex dump (first 32 bytes):
>      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>      00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff ...........>....
>    backtrace:
>      [<ffffffff9e9b5542>] slab_post_alloc_hook+0xb2/0x340
>      [<ffffffff9e9bbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>      [<ffffffff9e8f767a>] kmalloc_trace+0x2a/0xb0
>      [<ffffffffc08dfde5>] drm_vma_node_allow+0x45/0x150 [drm]
>      [<ffffffffc0b33315>] __assign_mmap_offset_handle+0x615/0x820 [i915]
>      [<ffffffffc0b34057>] i915_gem_mmap_offset_ioctl+0x77/0x110 [i915]
>      [<ffffffffc08bc5e1>] drm_ioctl_kernel+0x181/0x280 [drm]
>      [<ffffffffc08bc9cd>] drm_ioctl+0x2dd/0x6a0 [drm]
>      [<ffffffff9ea54744>] __x64_sys_ioctl+0xc4/0x100
>      [<ffffffff9fbc0178>] do_syscall_64+0x58/0x80
>      [<ffffffff9fc000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>

Root cause is that 786555987207 started caching (and sharing) the
i915_mmap_offset objects per object and same mmap type. This means that
reference count incremented by drm_vma_node_allow could grow beyond one,
while the object closure path calls drm_vma_node_revoke only once and
so the structure leaks.

Secondary effect from this, which is also different than what we had
before 786555987207 is that it is now possible to mmap an offset belonging
to a closed object.

Fix here is to partially revert to behaviour before 786555987207 - that is
to disallow mmap of closed objects and to only increment the mmap offset
ref count once per object-type.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Co-developed-by: Chuansheng Liu <chuansheng.liu at intel.com>
Fixes: 786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list")
Reported-by: Mirsad Todorovac <mirsad.todorovac at alu.unizg.hr>
Tested-by: Mirsad Todorovac <mirsad.todorovac at alu.unizg.hr>
Testcase: igt at gem_mmap_gtt@mmap-closed-bo
Cc: Matthew Auld <matthew.auld at intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom at linux.intel.com>
Cc: <stable at vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
---
Test-with: 20221222100403.256775-1-tvrtko.ursulin at linux.intel.com
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index d73ba0f5c4c5..1ceff19a0ac0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -695,9 +695,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
 insert:
 	mmo = insert_mmo(obj, mmo);
 	GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
-out:
+
 	if (file)
 		drm_vma_node_allow(&mmo->vma_node, file);
+out:
 	return mmo;
 
 err:
-- 
2.34.1



More information about the Intel-gfx mailing list