[Bug 106136] per-process/context memory usage accounting for i915

Mon May 7 11:47:07 UTC 2018

https://bugs.freedesktop.org/show_bug.cgi?id=106136

--- Comment #7 from Eero Tamminen <eero.t.tamminen at intel.com> ---
(In reply to Abdiel Janulgue from comment #6)
> (In reply to Eero Tamminen from comment #5)
> > What of this do you think could be implemented on user-space???
> 
> Parsing these sysfs files (which contain gem object allocation and free
> space in the gtt):
> 
> /sys/kernel/debug/dri/0/i915_gem_gtt
> /sys/kernel/debug/dri/0/i915_gem_objects
> 
> Could be enough for a userspace client to basically "guesstimate" and manage
> how much GPU memory it requests for itself?

In some cases, but not not all, and it's clearly not a correct place.

While many popular distros mount debugfs, all don't, and *accessing it requires
root privileges* (as would mounting it, when it's not present).  However, user
and memory debugging tools should be able to access resource usage information
for user's own processes without needing root, because how else user is able to
detect and handle leaks for user's own programs (when user doesn't have root
access)?

Daniel Vetter commented, in another matter (few weeks ago, don't know whether
he still thinks the same):
"If you expect your tool to run on redhat enterprise linux (and similar
places), no debugfs for you. It's simply not available (and really shouldn't
be, because a bunch of stuff in there are direct kernel exploits)"

-> Memory usage information should be available in /proc/ where all the other
(kernel internal & user-space) memory usage information is available, and
*where people know to search for it*, not hidden in /debugfs/.

> Note that from a GPU context POV, it is allowed access to entire 2 tebibytes
> of virtual address space. I guess by design kernel won't get in the way and
> give it as much as it wants until physical backing pages are no more.

Yep.  If GEM object is requested, it's also likely to be written to, i.e. it's
dirty.  If process leaks such object, it's not anymore used, and kernel will
just swap it out when more RAM is needed. When swap runs out, device is in OOM
crash-fest until swap filling process happens to terminate or get killed.

As to common swap sizes...  I checked 10 most popular devices currently sold at
verkkokauppa.com; 6 had 4GB, 3 had 8GB and 1 had 16GB of RAM.  I think most
distros still use 2x RAM for swap size by default.  I.e. with bug 106106
use-case, this would be 6-9 hours of constant use on most currently sold
devices, after which swap would be full and system unusable [1] even when
nothing else is running on the system besides desktop itself.

[1] bug 106106 is somewhat unusual case because X server is long running
process with lowish (normal) memory usage, which runs at elevated privileges,
so it has low OOM-score (see /proc/$PID/oom_score).  As result, all other apps
get OOM-killed instead X that leaks (this may, or may not, be a good thing as
with X , all other GUI apps would go down too).

I think in most cases, 3D application itself would not be privileged and it
would use also a lot of normal memory, so it could be one of the first kill
victims on its GEM object leakage triggered system OOM situation.

Worst case would be where 3D driver itself leaks, i.e. many 3D apps would leak.
 Currently devs and users wouldn't really see that because of this bug.  They
most likely wouldn't notice there is a leak, they would not know how to
investigate / find a cause for it, or bisect it.  It would just be random 
(OOM-kill or alloc abort) crashes for them with a new driver.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180507/b3d48aae/attachment-0001.html>