[PATCH 05/13] drm/ttm: overhaul memory accounting

Thu Nov 10 12:05:22 PST 2011

On 11/10/2011 07:05 PM, Jerome Glisse wrote:
> On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote:
>    
>> On 11/09/2011 09:22 PM, j.glisse at gmail.com wrote:
>>      
>>> From: Jerome Glisse<jglisse at redhat.com>
>>>
>>> This is an overhaul of the ttm memory accounting. This tries to keep
>>> the same global behavior while removing the whole zone concept. It
>>> keeps a distrinction for dma32 so that we make sure that ttm don't
>>> starve the dma32 zone.
>>>
>>> There is 3 threshold for memory allocation :
>>> - max_mem is the maximum memory the whole ttm infrastructure is
>>>    going to allow allocation for (exception of system process see
>>>    below)
>>> - emer_mem is the maximum memory allowed for system process, this
>>>    limit is>   to max_mem
>>> - swap_limit is the threshold at which point ttm will start to
>>>    try to swap object because ttm is getting close the max_mem
>>>    limit
>>> - swap_dma32_limit is the threshold at which point ttm will start
>>>    swap object to try to reduce the pressure on the dma32 zone. Note
>>>    that we don't specificly target object to swap to it might very
>>>    well free more memory from highmem rather than from dma32
>>>
>>> Accounting is done through used_mem&   used_dma32_mem, which sum give
>>> the total amount of memory actually accounted by ttm.
>>>
>>> Idea is that allocation will fail if (used_mem + used_dma32_mem)>
>>> max_mem and if swapping fail to make enough room.
>>>
>>> The used_dma32_mem can be updated as a later stage, allowing to
>>> perform accounting test before allocating a whole batch of pages.
>>>
>>>        
>> Jerome, you're removing a fair amount of functionality here, without
>> justifying
>> why it could be removed.
>>      
> All this code was overkill.
>    

[1] I don't agree, and since it's well tested, thought throught and 
working, I see no obvious reason to alter it,
within the context of this patch series unless it's absolutely required 
for the functionality.

>
>    
>> Consider a low-end system with 1G of kernel memory and 10G of
>> highmem. How do we avoid putting stress on the kernel memory? I also
>> wouldn't be too surprised if DMA32 zones appear in HIGHMEM systems
>> in the future making the current zone concept good to keep.
>>      
> Right now kernel memory is accounted against all zone so it decrease
> not only the kernel zone but also the dma32&  highmem if present.
>    

Do you mean that the code is incorrect? In that case, did you consider 
the fact
that zones may overlap? (Although I admit the name "highmem" might be 
misleading. Should be "total").

> Note also that kernel zone in current code == dma32 zone.
>    

Last time I looked, the highmem split was typically at slightly less 
than 1GB, depending on the hardware and desired setup. I admit that was 
some time ago, but has that really changed? On all archs?
Furthermore, on !Highmem systems, All pages are in the kernel zone.

> When it comes to future it looks a lot simpler, it seems everyone
> is moving toward more capable and more advanced iommu that can remove
> all the restriction on memory from the device pov. I mean even arm
> are getting more and more advance iommu. I don't see any architecture
> worse supporting not going down that road.
>    

While the proposed change is probably possible, with different low <-> 
high splits depending on whether HIGHMEM is defined or not, I think [1] 
is a good reason for not pushing it through.

>    
>> Also, in effect you move the DOS from *all* zones into the DMA32
>> zone and create a race in that multiple simultaneous allocators can
>> first pre-allocate out of the global zone, and then update the DMA32
>> zone without synchronization. In this way you might theoretically
>> end up with more DMA32 pages allocated than present in the zone.
>>      
> Ok a respin attached with a simple change, things will be
> accounted against dma32 zone and only when we get page we will
> decrease the dma32 zone usage that way no DOS on dma32.
>
> It also deals with the case where there is still lot of highmem
> but no more dma32.
>    

So why not just do a ttm_mem_global_alloc() for the pages you want to 
allocate,
and add a proper adjustment function if memory turns out to be either 
HIGHMEM or !DMA32

>    
>> With the proposed code there's also a theoretical problem in that a
>> potentially huge number of pages are unaccounted before they are
>> actually freed.
>>      
> What you mean unaccounted ? The way it works is :
> r = global_memory_alloc(size)
> if (r) fail
> alloc pages
> update memory accounting according to what page where allocated
>
> So memory is always accounted before even being allocated (exception
> are the kernel object for vmwgfx&  ttm_bo but we can move accounting
> there too if you want, those are small allocation and i didn't think
> it was worse changing that).
>    
No, I mean the sequence

unaccount_page_array()
---Race--
free_page_array()

/Thomas

> Cheers,
> Jerome
>