GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

Wed Aug 13 06:01:08 PDT 2014

On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> > On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> >> On 08/13/2014 05:52 AM, Jérôme Glisse wrote:
> >>> From: Jérôme Glisse <jglisse at redhat.com>
> >>>
> >>> When experiencing memory pressure we want to minimize pool size so that
> >>> memory we just shrinked is not added back again just as the next thing.
> >>>
> >>> This will divide by 2 the maximum pool size for each device each time
> >>> the pool have to shrink. The limit is bumped again is next allocation
> >>> happen after one second since the last shrink. The one second delay is
> >>> obviously an arbitrary choice.
> >> Jérôme,
> >>
> >> I don't like this patch. It adds extra complexity and its usefulness is
> >> highly questionable.
> >> There are a number of caches in the system, and if all of them added
> >> some sort of voluntary shrink heuristics like this, we'd end up with
> >> impossible-to-debug unpredictable performance issues.
> >>
> >> We should let the memory subsystem decide when to reclaim pages from
> >> caches and what caches to reclaim them from.
> > Yeah, artificially limiting your cache from growing when your shrinker
> > gets called will just break the equal-memory pressure the core mm uses to
> > rebalance between all caches when workload changes. In i915 we let
> > everything grow without artificial bounds and only rely upon the shrinker
> > callbacks to ensure we don't consume more than our fair share of available
> > memory overall.
> > -Daniel
> 
> Now when you bring i915 memory usage up, Daniel,
> I can't refrain from bringing up the old user-space unreclaimable kernel
> memory issue, for which gem open is a good example ;) Each time
> user-space opens a gem handle, some un-reclaimable kernel memory is
> allocated, for which there is no accounting, so theoretically I think a
> user can bring a system to unusability this way.
> 
> Typically there are various limits on unreclaimable objects like this,
> like open file descriptors, and IIRC the kernel even has an internal
> limit on the number of struct files you initialize, based on the
> available system memory, so dma-buf / prime should already have some
> sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
so there's not really a way to isolate gpu memory usage in a sane way for
specific processes. But there's also zero limits on actual gpu usage
itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that
they're all unlimited.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch