GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

Wed Aug 13 10:38:26 PDT 2014

On 08/13/2014 06:30 PM, Alex Deucher wrote:
> On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
>>> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
>>>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>>>>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>>>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>>>>>> On 08/13/2014 05:52 AM, Jérôme Glisse wrote:
>>>>>>>> From: Jérôme Glisse <jglisse at redhat.com>
>>>>>>>>
>>>>>>>> When experiencing memory pressure we want to minimize pool size so that
>>>>>>>> memory we just shrinked is not added back again just as the next thing.
>>>>>>>>
>>>>>>>> This will divide by 2 the maximum pool size for each device each time
>>>>>>>> the pool have to shrink. The limit is bumped again is next allocation
>>>>>>>> happen after one second since the last shrink. The one second delay is
>>>>>>>> obviously an arbitrary choice.
>>>>>>> Jérôme,
>>>>>>>
>>>>>>> I don't like this patch. It adds extra complexity and its usefulness is
>>>>>>> highly questionable.
>>>>>>> There are a number of caches in the system, and if all of them added
>>>>>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>>>>>> impossible-to-debug unpredictable performance issues.
>>>>>>>
>>>>>>> We should let the memory subsystem decide when to reclaim pages from
>>>>>>> caches and what caches to reclaim them from.
>>>>>> Yeah, artificially limiting your cache from growing when your shrinker
>>>>>> gets called will just break the equal-memory pressure the core mm uses to
>>>>>> rebalance between all caches when workload changes. In i915 we let
>>>>>> everything grow without artificial bounds and only rely upon the shrinker
>>>>>> callbacks to ensure we don't consume more than our fair share of available
>>>>>> memory overall.
>>>>>> -Daniel
>>>>> Now when you bring i915 memory usage up, Daniel,
>>>>> I can't refrain from bringing up the old user-space unreclaimable kernel
>>>>> memory issue, for which gem open is a good example ;) Each time
>>>>> user-space opens a gem handle, some un-reclaimable kernel memory is
>>>>> allocated, for which there is no accounting, so theoretically I think a
>>>>> user can bring a system to unusability this way.
>>>>>
>>>>> Typically there are various limits on unreclaimable objects like this,
>>>>> like open file descriptors, and IIRC the kernel even has an internal
>>>>> limit on the number of struct files you initialize, based on the
>>>>> available system memory, so dma-buf / prime should already have some
>>>>> sort of protection.
>>>> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
>>>> so there's not really a way to isolate gpu memory usage in a sane way for
>>>> specific processes. But there's also zero limits on actual gpu usage
>>>> itself (timeslices or whatever) so I guess no one asked for this yet.
>>> In its simplest form (like in TTM if correctly implemented by drivers)
>>> this type of accounting stops non-privileged malicious GPU-users from
>>> exhausting all system physical memory causing grief for other kernel
>>> systems but not from causing grief for other GPU users. I think that's
>>> the minimum level that's intended also for example also for the struct
>>> file accounting.
>> I think in i915 we're fairly close on that minimal standard - interactions
>> with shrinkers and oom logic work decently. It starts to fall apart though
>> when we've actually run out of memory - if the real memory hog is a gpu
>> process the oom killer won't notice all that memory since it's not
>> accounted against processes correctly.
>>
>> I don't agree that gpu process should be punished in general compared to
>> other subsystems in the kernel. If the user wants to use 90% of all memory
>> for gpu tasks then I want to make that possible, even if it means that
>> everything else thrashes horribly. And as long as the system recovers and
>> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
>> fairly arbitrary (tunable) setting to limit system memory consumption, but
>> I might be wrong on that.
> Yes, it currently limits you to half of memory, but at least we would
> like to make it tuneable since there are a lot of user cases where the
> user wants to use 90% of memory for GPU tasks at the expense of
> everything else.
>
> Alex
>

It's in /sys/devices/virtual/drm/ttm/memory_accounting/*

Run-time tunable, but you should probably write an app to tune if you
want to hand out to users, since if you up the limit, you probably want
to modify a number of values.

zone_memory: ro: Total memory in the zone.
used_memory: ro: Currently pinned memory.
available_memory: rw: Allocation limit.
emergency_memory: rw: Allocation limit for CAP_SYS_ADMIN
swap_limit: rw: Swapper thread starts at this limit.

/Thomas