[Intel-xe] [RFC PATCH 3/4] drm/ttm: Handle -EAGAIN in ttm_resource_alloc as -ENOSPC.

Maarten Lankhorst maarten.lankhorst at linux.intel.com
Wed May 3 09:36:52 UTC 2023


On 2023-05-03 11:11, Thomas Hellström wrote:
> Hi, Maarten
>
> On 5/3/23 10:34, Maarten Lankhorst wrote:
>> This allows the drm cgroup controller to return no space is available..
>>
>> XXX: This is a hopeless simplification that changes behavior, and
>> returns -ENOSPC even if we could evict ourselves from the current
>> cgroup.
>>
>> Ideally, the eviction code becomes cgroup aware, and will force eviction
>> from the current cgroup or its parents.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>
> Thinking of the shrinker analogy, do non-cgroup aware shrinkers just 
> shrink blindly or do they reject shrinking like this patch when a 
> cgroup limit is reached?

When I made the cgroup controller return -ENOSPC I just hit an infinite 
loop since it sees enough memory is free and tries to allocate memory 
again. Hence the -EAGAIN handling here. It returns -ENOSPC, without the 
infinite looping.

I think there should be 2 code paths:

- OOM, generic case: Handle like we do now. No need for special cgroup 
handling needed right now.

Might change if we implement cgroup memory semantics. See the memory 
section of Documentation/admin-guide/cgroup-v2.rst

It could be useful regardless.

- OOM, cgroup limit reached: Check for each BO if it's valuable to evict to unblock the relevant limit.


              / cg1.0
root - cg1 --  cg1.1
    \         \ cg1.2
     \  cg2

If we hit the cg limit in cg1.0 for only cg1.0, it doesn't make sense to evict from any other cgroup.
If we hit the limit in cg1.0 for the entirety of cg1, it makes sense to evict from any of the cg1 nodes, but not from cg2.

This should be relatively straightforward to implement. We identify which cgroup hit a limit, and then let the shrinker
run only on that cgroup and its childs.

This could be simplified to the OOM generic case, for root/NULL cg.


~Maarten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230503/d19d2b16/attachment.htm>


More information about the amd-gfx mailing list