[PATCH 05/13] drm/ttm: overhaul memory accounting

Jerome Glisse j.glisse at gmail.com
Thu Nov 10 10:05:24 PST 2011


On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote:
> On 11/09/2011 09:22 PM, j.glisse at gmail.com wrote:
> >From: Jerome Glisse<jglisse at redhat.com>
> >
> >This is an overhaul of the ttm memory accounting. This tries to keep
> >the same global behavior while removing the whole zone concept. It
> >keeps a distrinction for dma32 so that we make sure that ttm don't
> >starve the dma32 zone.
> >
> >There is 3 threshold for memory allocation :
> >- max_mem is the maximum memory the whole ttm infrastructure is
> >   going to allow allocation for (exception of system process see
> >   below)
> >- emer_mem is the maximum memory allowed for system process, this
> >   limit is>  to max_mem
> >- swap_limit is the threshold at which point ttm will start to
> >   try to swap object because ttm is getting close the max_mem
> >   limit
> >- swap_dma32_limit is the threshold at which point ttm will start
> >   swap object to try to reduce the pressure on the dma32 zone. Note
> >   that we don't specificly target object to swap to it might very
> >   well free more memory from highmem rather than from dma32
> >
> >Accounting is done through used_mem&  used_dma32_mem, which sum give
> >the total amount of memory actually accounted by ttm.
> >
> >Idea is that allocation will fail if (used_mem + used_dma32_mem)>
> >max_mem and if swapping fail to make enough room.
> >
> >The used_dma32_mem can be updated as a later stage, allowing to
> >perform accounting test before allocating a whole batch of pages.
> >
> 
> Jerome, you're removing a fair amount of functionality here, without
> justifying
> why it could be removed.

All this code was overkill.
 
> Consider a low-end system with 1G of kernel memory and 10G of
> highmem. How do we avoid putting stress on the kernel memory? I also
> wouldn't be too surprised if DMA32 zones appear in HIGHMEM systems
> in the future making the current zone concept good to keep.

Right now kernel memory is accounted against all zone so it decrease
not only the kernel zone but also the dma32 & highmem if present.
Note also that kernel zone in current code == dma32 zone.

When it comes to future it looks a lot simpler, it seems everyone
is moving toward more capable and more advanced iommu that can remove
all the restriction on memory from the device pov. I mean even arm
are getting more and more advance iommu. I don't see any architecture
worse supporting not going down that road.

> Also, in effect you move the DOS from *all* zones into the DMA32
> zone and create a race in that multiple simultaneous allocators can
> first pre-allocate out of the global zone, and then update the DMA32
> zone without synchronization. In this way you might theoretically
> end up with more DMA32 pages allocated than present in the zone.

Ok a respin attached with a simple change, things will be
accounted against dma32 zone and only when we get page we will
decrease the dma32 zone usage that way no DOS on dma32.

It also deals with the case where there is still lot of highmem
but no more dma32.

> With the proposed code there's also a theoretical problem in that a
> potentially huge number of pages are unaccounted before they are
> actually freed.

What you mean unaccounted ? The way it works is :
r = global_memory_alloc(size)
if (r) fail
alloc pages
update memory accounting according to what page where allocated

So memory is always accounted before even being allocated (exception
are the kernel object for vmwgfx & ttm_bo but we can move accounting
there too if you want, those are small allocation and i didn't think
it was worse changing that).


> A possible way around all this is to pre-allocate out of *all*
> zones, and after the big allocation release back memory to relevant
> zones. If such a big allocation fails, one needs to revert back to a
> page-by-page scheme.
> 
> /Thomas

Really i believe this make the whole accounting a lot simpler. The
whole zone business was overkill as especialy kernelzone == dma32zone
and highmem zone is a superset of this.

With my change what happen is that only the dma32 distinction is kept
around because there is whole batch of device that can only do dma32
and we need to make sure we don't starve those.

I believe my code is lot easier and straighforward to understand.

Cheers,
Jerome


More information about the dri-devel mailing list