GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

Wed Aug 13 10:20:53 PDT 2014

On 08/13/2014 06:24 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>>>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>>>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote:
>>>>>>> From: Jérôme Glisse <jglisse at redhat.com>
>>>>>>>
>>>>>>> When experiencing memory pressure we want to minimize pool size so that
>>>>>>> memory we just shrinked is not added back again just as the next thing.
>>>>>>>
>>>>>>> This will divide by 2 the maximum pool size for each device each time
>>>>>>> the pool have to shrink. The limit is bumped again is next allocation
>>>>>>> happen after one second since the last shrink. The one second delay is
>>>>>>> obviously an arbitrary choice.
>>>>>> Jérôme,
>>>>>>
>>>>>> I don't like this patch. It adds extra complexity and its usefulness is
>>>>>> highly questionable.
>>>>>> There are a number of caches in the system, and if all of them added
>>>>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>>>>> impossible-to-debug unpredictable performance issues.
>>>>>>
>>>>>> We should let the memory subsystem decide when to reclaim pages from
>>>>>> caches and what caches to reclaim them from.
>>>>> Yeah, artificially limiting your cache from growing when your shrinker
>>>>> gets called will just break the equal-memory pressure the core mm uses to
>>>>> rebalance between all caches when workload changes. In i915 we let
>>>>> everything grow without artificial bounds and only rely upon the shrinker
>>>>> callbacks to ensure we don't consume more than our fair share of available
>>>>> memory overall.
>>>>> -Daniel
>>>> Now when you bring i915 memory usage up, Daniel,
>>>> I can't refrain from bringing up the old user-space unreclaimable kernel
>>>> memory issue, for which gem open is a good example ;) Each time
>>>> user-space opens a gem handle, some un-reclaimable kernel memory is
>>>> allocated, for which there is no accounting, so theoretically I think a
>>>> user can bring a system to unusability this way.
>>>>
>>>> Typically there are various limits on unreclaimable objects like this,
>>>> like open file descriptors, and IIRC the kernel even has an internal
>>>> limit on the number of struct files you initialize, based on the
>>>> available system memory, so dma-buf / prime should already have some
>>>> sort of protection.
>>> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
>>> so there's not really a way to isolate gpu memory usage in a sane way for
>>> specific processes. But there's also zero limits on actual gpu usage
>>> itself (timeslices or whatever) so I guess no one asked for this yet.
>> In its simplest form (like in TTM if correctly implemented by drivers)
>> this type of accounting stops non-privileged malicious GPU-users from
>> exhausting all system physical memory causing grief for other kernel
>> systems but not from causing grief for other GPU users. I think that's
>> the minimum level that's intended also for example also for the struct
>> file accounting.
> I think in i915 we're fairly close on that minimal standard - interactions
> with shrinkers and oom logic work decently. It starts to fall apart though
> when we've actually run out of memory - if the real memory hog is a gpu
> process the oom killer won't notice all that memory since it's not
> accounted against processes correctly.
>
> I don't agree that gpu process should be punished in general compared to
> other subsystems in the kernel. If the user wants to use 90% of all memory
> for gpu tasks then I want to make that possible, even if it means that
> everything else thrashes horribly. And as long as the system recovers and
> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
> fairly arbitrary (tunable) setting to limit system memory consumption, but
> I might be wrong on that.

No, that's correct, or rather it's intended to limit pinned
unreclaimable system memory (though part of what's unreclaimable could
actually be made reclaimable if we'd implement another shrinker level).

>>> My comment really was about balancing mm users under the assumption that
>>> they're all unlimited.
>> Yeah, sorry for stealing the thread. I usually bring this up now and
>> again but nowadays with an exponential backoff.
> Oh I'd love to see some cgroups or similar tracking so that server users
> could set sane per-process/user/task limits on how much memory/gpu time
> that group is allowed to consume. It's just that I haven't seen real
> demand for this and so couldn't make the time available to implement it.
> So thus far my goal is to make everything work nicely for unlimited tasks
> right up to the point where the OOM killer needs to step in. Past that
> everything starts to fall apart, but thus far that was good enough for
> desktop usage.

Well I'm not sure if things have changed but last time (a couple of
years ago) I looked at this situation (kernel out of physical memory but
a fair amount of swap space left) the OOM killer was never invoked, so a
number of more or less critical kernel systems (disk I/O, paging,
networking) where getting -ENOMEM and hitting rarely tested error paths.
A state you don't want to have the kernel in. Now the OOM algorithm may
of course have changed since then.

My point is that with unaccounted constructs like gem-open-from-name it
should be easy for any unpriviliged authenticated gem client to pin all
kernel physical memory, put the kernel in that state and keep it there,
and IMO a kernel-user space interface shouldn't allow that.

/Thomas

>
> Maybe WebGL will finally make this important enough so that we can fix it
> for real ...
> -Daniel