<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Memory leak in drm_atomic.c eventually (few days) consuming all RAM (on at least one system configuration)"
href="https://bugs.freedesktop.org/show_bug.cgi?id=98420">98420</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Memory leak in drm_atomic.c eventually (few days) consuming all RAM (on at least one system configuration)
</td>
</tr>
<tr>
<th>Product</th>
<td>DRI
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>x86-64 (AMD64)
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux (All)
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>major
</td>
</tr>
<tr>
<th>Priority</th>
<td>medium
</td>
</tr>
<tr>
<th>Component</th>
<td>DRM/Intel
</td>
</tr>
<tr>
<th>Assignee</th>
<td>intel-gfx-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>felix.monninger@gmail.com
</td>
</tr>
<tr>
<th>QA Contact</th>
<td>intel-gfx-bugs@lists.freedesktop.org
</td>
</tr>
<tr>
<th>CC</th>
<td>intel-gfx-bugs@lists.freedesktop.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=127524" name="attach_127524" title="Patch to drm_atomic.c calling the required drm_property_unreference_blob's">attachment 127524</a> <a href="attachment.cgi?id=127524&action=edit" title="Patch to drm_atomic.c calling the required drm_property_unreference_blob's">[details]</a></span> <a href='page.cgi?id=splinter.html&bug=98420&attachment=127524'>[review]</a>
Patch to drm_atomic.c calling the required drm_property_unreference_blob's
Problem:
I noticed that RAM irretrievably gets lost over the course of a few days. "cat
/proc/meminfo |grep SUnreclaim" frequently (after 2 or 3 days of uptime) grew
to high amounts (>2GB) of unreclaimable SLABs until out-of-memory failure.
System:
Linux 4.7.4+; 4.8.* (problem has been in Kernel since at least Feb 16)
Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
i915
Investigation:
1. /proc/slabinfo showed that lot of 4K memory blocks "kmalloc-4096" have been
allocated (visible after modifying slab.c to include those of size>4096 into
the output). Example line from /proc/slabinfo:
kmalloc-4096 11574 11578 4432 7 8 : tunables 0 0 0 :
slabdata 1654 1654 0
2. /proc/slab_allocators showed (again after modifying slab.c):
kmalloc-4096: 1954 drm_property_create_blob.part.19+0x27/0xe0 [drm]
(the numbers are from different times, in any case growing unboundedly until
reboot)
3. using ftrace revealed the following callstack being processed precisely
every .5 seconds:
drm_property_create_blob <- drm_atomic_helper_legacy_gamma_set
<-drm_mode_gamma_set_ioctl <- drm_ioctl
4. Reference counts on the "blob" allocated in
drm_atomic_helper.c:drm_atomic_helper_legacy_gamma_set increased by two is then
passed to after the line "ret = drm_atomic_crtc_set_property(crtc, crtc_state,
config->gamma_lut_property, blob->base.id);". (This should only be incremented
by 1 as ownership of this blob is passed to the crtc_state which then keeps the
blob as an updated property value.)
5. As the blob is passed by id ("..., blob->base.id)"), in
drm_atomic.c:drm_atomic_replace_property_blob_from_id the function
drm_property_lookup_blob(dev, blob_id); is called. Note that the function
manual says "If successful, this takes an additional reference to the blob
property. callers need to make sure to eventually unreference the returned
property again, using @drm_property_unreference_blob.", which is not being done
in this case.
6. This leads to the old state->degamma_lut that is replaced by the updated
blob never being freed (even after its refcount being properly decremented by 1
at the remaining places), since its refcount has been incremented once too
much. Thus every half second an 4K block is wasted.
Fix:
We call drm_property_unreference_blob(new_blob) at the appropriate spots in
drm_atomic.c:drm_atomic_replace_property_blob_from_id . Please see the attached
patch.
I wonder, which hardware is affected by this "legacy" code (i. e.
drm_atomic_helper_legacy_gamma_set)? Only older intel HD graphics (<4000?)
devices, or is it actually more widely used despite the legacy naming?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are on the CC list for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>