[Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak
Mirsad Goran Todorovac
mirsad.todorovac at alu.unizg.hr
Thu Dec 22 00:12:24 UTC 2022
On 20. 12. 2022. 20:34, Mirsad Todorovac wrote:
> On 12/20/22 16:52, Tvrtko Ursulin wrote:
>
>> On 20/12/2022 15:22, srinivas pandruvada wrote:
>>> +Added DRM mailing list and maintainers
>>>
>>> On Tue, 2022-12-20 at 15:33 +0100, Mirsad Todorovac wrote:
>>>> Hi all,
>>>>
>>>> I have been unsuccessful to find any particular Intel i915 maintainer
>>>> emails, so my best bet is to post here, as you will must assuredly
>>>> already know them.
>>
>> For future reference you can use ${kernel_dir}/scripts/get_maintainer.pl -f ...
>>
>>>> The problem is a kernel memory leak that is repeatedly occurring
>>>> triggered during the execution of Chrome browser under the latest
>>>> 6.1.0+
>>>> kernel of this morning and Almalinux 8.6 on a Lenovo desktop box
>>>> with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.
>>>>
>>>> The build is with KMEMLEAK, KASAN and MGLRU turned on during the
>>>> build,
>>>> on a vanilla mainline kernel from Mr. Torvalds' tree.
>>>>
>>>> The leaks look like this one:
>>>>
>>>> unreferenced object 0xffff888131754880 (size 64):
>>>> comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
>>>> hex dump (first 32 bytes):
>>>> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> ................
>>>> 00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff
>>>> ...........>....
>>>> backtrace:
>>>> [<ffffffff9e9b5542>] slab_post_alloc_hook+0xb2/0x340
>>>> [<ffffffff9e9bbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>>> [<ffffffff9e8f767a>] kmalloc_trace+0x2a/0xb0
>>>> [<ffffffffc08dfde5>] drm_vma_node_allow+0x45/0x150 [drm]
>>>> [<ffffffffc0b33315>] __assign_mmap_offset_handle+0x615/0x820
>>>> [i915]
>>>> [<ffffffffc0b34057>] i915_gem_mmap_offset_ioctl+0x77/0x110
>>>> [i915]
>>>> [<ffffffffc08bc5e1>] drm_ioctl_kernel+0x181/0x280 [drm]
>>>> [<ffffffffc08bc9cd>] drm_ioctl+0x2dd/0x6a0 [drm]
>>>> [<ffffffff9ea54744>] __x64_sys_ioctl+0xc4/0x100
>>>> [<ffffffff9fbc0178>] do_syscall_64+0x58/0x80
>>>> [<ffffffff9fc000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>
>>>> The complete list of leaks in attachment, but they seem similar or
>>>> the same.
>>>>
>>>> Please find attached lshw and kernel build config file.
>>>>
>>>> I will probably check the same parms on my laptop at home, which is
>>>> also
>>>> Lenovo, but a different hw config and Ubuntu 22.10.
>>
>> Could you try the below patch?
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index c3ea243d414d..0b07534c203a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -679,9 +679,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>> insert:
>> mmo = insert_mmo(obj, mmo);
>> GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
>> -out:
>> +
>> if (file)
>> drm_vma_node_allow(&mmo->vma_node, file);
>> +out:
>> return mmo;
>>
>> err:
>>
>> Maybe it is not the best fix but curious to know if it will make the leak go away.
>
> Hi,
>
> After 27 minutes uptime with the patched kernel it looks promising.
> It is much longer than it took for the buggy kernel to leak slabs.
>
> Here is the output:
>
> [root at pc-mtodorov marvin]# echo scan > /sys/kernel/debug/kmemleak
> [root at pc-mtodorov marvin]# cat !$
> cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff888105028d80 (size 16):
> comm "kworker/u12:5", pid 359, jiffies 4294902898 (age 1620.144s)
> hex dump (first 16 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> backtrace:
> [<ffffffffb6bb5542>] slab_post_alloc_hook+0xb2/0x340
> [<ffffffffb6bbbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
> [<ffffffffb6af8175>] __kmalloc_node_track_caller+0x55/0x160
> [<ffffffffb6ae34a6>] kstrdup+0x36/0x60
> [<ffffffffb6ae3508>] kstrdup_const+0x28/0x30
> [<ffffffffb70d0757>] kvasprintf_const+0x97/0xd0
> [<ffffffffb7c9cdf4>] kobject_set_name_vargs+0x34/0xc0
> [<ffffffffb750289b>] dev_set_name+0x9b/0xd0
> [<ffffffffc12d9201>] memstick_check+0x181/0x639 [memstick]
> [<ffffffffb676e1d6>] process_one_work+0x4e6/0x7e0
> [<ffffffffb676e556>] worker_thread+0x76/0x770
> [<ffffffffb677b468>] kthread+0x168/0x1a0
> [<ffffffffb6604c99>] ret_from_fork+0x29/0x50
> [root at pc-mtodorov marvin]# w
> 20:27:35 up 27 min, 2 users, load average: 0.83, 1.15, 1.19
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> marvin tty2 tty2 20:01 27:10 10:12 2.09s /opt/google/chrome/chrome --type=utility --utility-sub-type=audio.m
> marvin pts/1 - 20:01 0.00s 2:00 0.38s sudo bash
> [root at pc-mtodorov marvin]# uname -rms
> Linux 6.1.0-b6bb9676f216-mglru-kmemlk-kasan+ x86_64
> [root at pc-mtodorov marvin]#
As I hear no reply from Tvrtko, and there is already 1d5h uptime with no leaks (but
the kworker with memstick_check nag I couldn't bisect on the only box that reproduced it,
because something in hw was not supported in pre 4.16 kernels on the Lenovo V530S-07ICB.
Or I am doing something wrong.)
However, now I can find the memstick maintainers thanks to Tvrtko's hint.
If you no longer require my service, I would close this on my behalf.
I hope I did not cause too much trouble. The knowledgeable knew that this was not a security
risk, but only a bug. (30 leaks of 64 bytes each were hardly to exhaust memory in any realistic
time.)
However, having some experience with software development, I always preferred bugs reported
and fixed rather than concealed and lying in wait (or worse, found first by a motivated
adversary.) Forgive me this rant, I do not live from writing kernel drivers, this is just a
pet project as of time being ...
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
More information about the Intel-gfx
mailing list