[Intel-gfx] LOOKS GOOD: Possible regression in drm/i915 driver: memleak

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Dec 22 08:04:28 UTC 2022


On 22/12/2022 00:12, Mirsad Goran Todorovac wrote:
> On 20. 12. 2022. 20:34, Mirsad Todorovac wrote:
>> On 12/20/22 16:52, Tvrtko Ursulin wrote:
>>
>>> On 20/12/2022 15:22, srinivas pandruvada wrote:
>>>> +Added DRM mailing list and maintainers
>>>>
>>>> On Tue, 2022-12-20 at 15:33 +0100, Mirsad Todorovac wrote:
>>>>> Hi all,
>>>>>
>>>>> I have been unsuccessful to find any particular Intel i915 maintainer
>>>>> emails, so my best bet is to post here, as you will must assuredly
>>>>> already know them.
>>>
>>> For future reference you can use 
>>> ${kernel_dir}/scripts/get_maintainer.pl -f ...
>>>
>>>>> The problem is a kernel memory leak that is repeatedly occurring
>>>>> triggered during the execution of Chrome browser under the latest
>>>>> 6.1.0+
>>>>> kernel of this morning and Almalinux 8.6 on a Lenovo desktop box
>>>>> with Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz CPU.
>>>>>
>>>>> The build is with KMEMLEAK, KASAN and MGLRU turned on during the
>>>>> build,
>>>>> on a vanilla mainline kernel from Mr. Torvalds' tree.
>>>>>
>>>>> The leaks look like this one:
>>>>>
>>>>> unreferenced object 0xffff888131754880 (size 64):
>>>>>     comm "chrome", pid 13058, jiffies 4298568878 (age 3708.084s)
>>>>>     hex dump (first 32 bytes):
>>>>>       01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ................
>>>>>       00 00 00 00 00 00 00 00 00 80 1e 3e 83 88 ff ff
>>>>> ...........>....
>>>>>     backtrace:
>>>>>       [<ffffffff9e9b5542>] slab_post_alloc_hook+0xb2/0x340
>>>>>       [<ffffffff9e9bbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>>>>       [<ffffffff9e8f767a>] kmalloc_trace+0x2a/0xb0
>>>>>       [<ffffffffc08dfde5>] drm_vma_node_allow+0x45/0x150 [drm]
>>>>>       [<ffffffffc0b33315>] __assign_mmap_offset_handle+0x615/0x820
>>>>> [i915]
>>>>>       [<ffffffffc0b34057>] i915_gem_mmap_offset_ioctl+0x77/0x110
>>>>> [i915]
>>>>>       [<ffffffffc08bc5e1>] drm_ioctl_kernel+0x181/0x280 [drm]
>>>>>       [<ffffffffc08bc9cd>] drm_ioctl+0x2dd/0x6a0 [drm]
>>>>>       [<ffffffff9ea54744>] __x64_sys_ioctl+0xc4/0x100
>>>>>       [<ffffffff9fbc0178>] do_syscall_64+0x58/0x80
>>>>>       [<ffffffff9fc000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>
>>>>> The complete list of leaks in attachment, but they seem similar or
>>>>> the same.
>>>>>
>>>>> Please find attached lshw and kernel build config file.
>>>>>
>>>>> I will probably check the same parms on my laptop at home, which is
>>>>> also
>>>>> Lenovo, but a different hw config and Ubuntu 22.10.
>>>
>>> Could you try the below patch?
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index c3ea243d414d..0b07534c203a 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -679,9 +679,10 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>   insert:
>>>          mmo = insert_mmo(obj, mmo);
>>>          GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
>>> -out:
>>> +
>>>          if (file)
>>>                  drm_vma_node_allow(&mmo->vma_node, file);
>>> +out:
>>>          return mmo;
>>>
>>>   err:
>>>
>>> Maybe it is not the best fix but curious to know if it will make the 
>>> leak go away.
>>
>> Hi,
>>
>> After 27 minutes uptime with the patched kernel it looks promising.
>> It is much longer than it took for the buggy kernel to leak slabs.
>>
>> Here is the output:
>>
>> [root at pc-mtodorov marvin]# echo scan > /sys/kernel/debug/kmemleak
>> [root at pc-mtodorov marvin]# cat !$
>> cat /sys/kernel/debug/kmemleak
>> unreferenced object 0xffff888105028d80 (size 16):
>>    comm "kworker/u12:5", pid 359, jiffies 4294902898 (age 1620.144s)
>>    hex dump (first 16 bytes):
>>      6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0.......
>>    backtrace:
>>      [<ffffffffb6bb5542>] slab_post_alloc_hook+0xb2/0x340
>>      [<ffffffffb6bbbf5f>] __kmem_cache_alloc_node+0x1bf/0x2c0
>>      [<ffffffffb6af8175>] __kmalloc_node_track_caller+0x55/0x160
>>      [<ffffffffb6ae34a6>] kstrdup+0x36/0x60
>>      [<ffffffffb6ae3508>] kstrdup_const+0x28/0x30
>>      [<ffffffffb70d0757>] kvasprintf_const+0x97/0xd0
>>      [<ffffffffb7c9cdf4>] kobject_set_name_vargs+0x34/0xc0
>>      [<ffffffffb750289b>] dev_set_name+0x9b/0xd0
>>      [<ffffffffc12d9201>] memstick_check+0x181/0x639 [memstick]
>>      [<ffffffffb676e1d6>] process_one_work+0x4e6/0x7e0
>>      [<ffffffffb676e556>] worker_thread+0x76/0x770
>>      [<ffffffffb677b468>] kthread+0x168/0x1a0
>>      [<ffffffffb6604c99>] ret_from_fork+0x29/0x50
>> [root at pc-mtodorov marvin]# w
>>   20:27:35 up 27 min,  2 users,  load average: 0.83, 1.15, 1.19
>> USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
>> marvin   tty2     tty2             20:01   27:10  10:12   2.09s 
>> /opt/google/chrome/chrome --type=utility --utility-sub-type=audio.m
>> marvin   pts/1    -                20:01    0.00s  2:00   0.38s sudo bash
>> [root at pc-mtodorov marvin]# uname -rms
>> Linux 6.1.0-b6bb9676f216-mglru-kmemlk-kasan+ x86_64
>> [root at pc-mtodorov marvin]#
> 
> As I hear no reply from Tvrtko, and there is already 1d5h uptime with no 
> leaks (but
> the kworker with memstick_check nag I couldn't bisect on the only box 
> that reproduced it,
> because something in hw was not supported in pre 4.16 kernels on the 
> Lenovo V530S-07ICB.
> Or I am doing something wrong.)
> 
> However, now I can find the memstick maintainers thanks to Tvrtko's hint.
> 
> If you no longer require my service, I would close this on my behalf.
> 
> I hope I did not cause too much trouble. The knowledgeable knew that 
> this was not a security
> risk, but only a bug. (30 leaks of 64 bytes each were hardly to exhaust 
> memory in any realistic
> time.)
> 
> However, having some experience with software development, I always 
> preferred bugs reported
> and fixed rather than concealed and lying in wait (or worse, found first 
> by a motivated
> adversary.) Forgive me this rant, I do not live from writing kernel 
> drivers, this is just a
> pet project as of time being ...

It is not forgotten - I was trying to reach out to the original author 
of the fixlet which worked for you. If that fails I will take it up on 
myself, but need to set aside some time to get into the exact problem 
space before I can vouch for the fix and send it on my own.

In the meantime definitely thanks a lot for testing this quickly and 
reporting back!

What will happen next is, that when either the original author or myself 
are ready to send out the fix as a proper patch, you will be copied on 
it via the "Reported-by" and possibly "Tested-by" tags. Latter is if the 
patch remains identical. If it changes we might kindly ask you to 
re-test if possible.

Regards,

Tvrtko


More information about the Intel-gfx mailing list