[PATCH 05/13] drm/ttm: overhaul memory accounting

Thu Nov 10 23:49:39 PST 2011

On 11/11/2011 12:33 AM, Jerome Glisse wrote:
> On Thu, Nov 10, 2011 at 09:05:22PM +0100, Thomas Hellstrom wrote:
>    
>> On 11/10/2011 07:05 PM, Jerome Glisse wrote:
>>      
>>> On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote:
>>>        
>>>> On 11/09/2011 09:22 PM, j.glisse at gmail.com wrote:
>>>>          
>>>>> From: Jerome Glisse<jglisse at redhat.com>
>>>>>
>>>>> This is an overhaul of the ttm memory accounting. This tries to keep
>>>>> the same global behavior while removing the whole zone concept. It
>>>>> keeps a distrinction for dma32 so that we make sure that ttm don't
>>>>> starve the dma32 zone.
>>>>>
>>>>> There is 3 threshold for memory allocation :
>>>>> - max_mem is the maximum memory the whole ttm infrastructure is
>>>>>    going to allow allocation for (exception of system process see
>>>>>    below)
>>>>> - emer_mem is the maximum memory allowed for system process, this
>>>>>    limit is>    to max_mem
>>>>> - swap_limit is the threshold at which point ttm will start to
>>>>>    try to swap object because ttm is getting close the max_mem
>>>>>    limit
>>>>> - swap_dma32_limit is the threshold at which point ttm will start
>>>>>    swap object to try to reduce the pressure on the dma32 zone. Note
>>>>>    that we don't specificly target object to swap to it might very
>>>>>    well free more memory from highmem rather than from dma32
>>>>>
>>>>> Accounting is done through used_mem&    used_dma32_mem, which sum give
>>>>> the total amount of memory actually accounted by ttm.
>>>>>
>>>>> Idea is that allocation will fail if (used_mem + used_dma32_mem)>
>>>>> max_mem and if swapping fail to make enough room.
>>>>>
>>>>> The used_dma32_mem can be updated as a later stage, allowing to
>>>>> perform accounting test before allocating a whole batch of pages.
>>>>>
>>>>>            
>>>> Jerome, you're removing a fair amount of functionality here, without
>>>> justifying
>>>> why it could be removed.
>>>>          
>>> All this code was overkill.
>>>        
>> [1] I don't agree, and since it's well tested, thought throught and
>> working, I see no obvious reason to alter it,
>> within the context of this patch series unless it's absolutely
>> required for the functionality.
>>      
> Well one thing i can tell is that it doesn't work on radeon, i pushed
> a test to libdrm and here it's the oom that starts doing its beating.
> Anyway i won't alter it. Was just trying to make it works, ie be useful
> while also being simpler.
>    

Well if it doesn't work it should of course be fixed.

I'm not against fixing it nor making it simpler, but I think that 
requires a detailed understanding of what's going wrong and how it needs 
to be fixed. Not as part of a patch series that really tries to 
accomplish something else.

The current code was tested extensively with psb and unichrome.
One good test for drivers with bo-backed textures is to continously 
create fairly large texture images. The end result should be the swap 
space starting to fill up and once there is no more swap space, the OOM 
killer should kill your app, and kmalloc failures should be avoided. It 
should be tricky to get a failure from the global alloc system, but a 
huge amount of small buffer objects or fence objects should probably do it.

Naturally, that requires that all persistent drm objects created from 
user-space are registered with their correct sizes, or at least a really 
good size approximation. That includes things like gem flinks, that 
could otherwise easily be exploited to bring a system down, simply by 
guessing a gem name and create flinks to that name in an infinite loop.

What are the symptoms of the failure you're seeing with Radeon? Any 
suggestions on why it happens?

Thanks,
Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20111111/ecb87af3/attachment.html>