TTM's role in score-based eviction

Mon Dec 9 09:28:23 PST 2013

On Thu, Dec 05, 2013 at 11:45:03AM -0500, Jerome Glisse wrote:
> On Thu, Dec 05, 2013 at 05:22:54PM +0100, Maarten Lankhorst wrote:
> > op 05-12-13 16:49, Jerome Glisse schreef:
> > > On Thu, Dec 05, 2013 at 11:26:46AM +0100, Thomas Hellstrom wrote:
> > >> Hi!
> > >>
> > >> On 12/05/2013 10:36 AM, Lauri Kasanen wrote:
> > >>> Hi list, Thomas,
> > >>>
> > >>> I will be investigating the use of a hotness score for each bo, to
> > >>> replace the ping-pong causing LRU eviction in radeon*.
> > >>>
> > >>> The goal is to put all bos that fit in VRAM there, in order of hotness;
> > >>> a new bo should only be placed there if its hotness score is greater
> > >>> than the lowest VRAM bo's. Then the lowest-hotness-bos in
> > >>> VRAM should be evicted until the new bo fits. This should result in a
> > >>> more stable set with less ping-pong.
> > >>>
> > >>> Jerome advised that the bo placement should be done entirely outside
> > >>> TTM. As I'm not (yet) too familiar with that side of the kernel, what is
> > >>> the opinion of TTM folks?
> > >> There are a couple of things to be considered:
> > >> 1) You need to decide where a bo to be validated should be placed.
> > >> The driver can give a list of possible placements to TTM and let
> > >> TTM decide, trying each placement in turn. A driver that thinks this
> > >> isn't sufficient can come up with its on strategy and give only a
> > >> single placement to TTM. If TTM can't satisfy that, it will give you
> > >> an error back, and the driver will need to validate with an
> > >> alternative placement. I think Radeon already does this? vmwgfx does
> > >> it to some extent.
> > >>
> > >> 2) As you say, TTM is evicting strictly on an lru basis, and is
> > >> maintaining one LRU list per memory type, and also a global swap lru
> > >> list for buffers that are backed by system pages (not VRAM). I guess
> > >> what you would want to do is to replace the VRAM lru list with a
> > >> priority queue where bos are continously sorted based on hotness.
> > >> As long as you obey the locking rules:
> > >> *) Locking order is bo::reserve -> lru-lock
> > >> *) When walking the queue with the lru-lock held, you must therefore
> > >> tryreserve if you want to reserve an object on the queue
> > >> *) bo:s need to be removed from the queue as soon as they are reserved
> > >> *) Don't remove a bo from the queue unless it is reserved
> > >> Nothing stops you from doing this in the driver, but OTOH if this
> > >> ends up being useful for other drivers I'd prefer we put it into
> > >> TTM.
> > > It will be useful to others, the point i am making is that others might
> > > not use ttm either and there is nothing about bo placement that needs
> > > to be ttm specific.
> > >
> > > To avoid bo eviction from lru list is just a matter of driver never
> > > over committing bo on a pool of memory and driver doing eviction by
> > > itself, ie deciding of a new placement for bo and moving that bo
> > > before moving in other bo, which can be done outside ttm.
> > >
> > > The only thing that will needs modification to ttm is work done to
> > > control memory fragmentation but this should be not be enforce on
> > > all ttm user and should be a runtime decision. GPU with virtual
> > > address space can scatter bo through vram by using vram pages making
> > > memory fragmentation pretty much a non issue (some GPU still needs
> > > contiguous memory for scan out buffer or other specific buffer).
> > >
> > You're correct it COULD be done like that, but that's a nasty workaround.
> > Simply assign a priority to each buffer, then modify ttm_bo_add_to_lru,
> > ttm_bo_swapout, ttm_mem_evict_first and be done with it.
> > 
> > Memory management is exactly the kind of thing that should be done in TTM,
> > so why have something 'generic' for something that's little more than a renamed priority queue?
> 
> The end score and use of the score for placement decision be done in ttm
> but the whole score computation and heuristic related to it should not.

btw another thing to look at is the eviction roaster in drm_mm. It's
completely standalone, the only thing it requires is that you have a
deterministic order to add objects to it and unroll them (but that can
always be solved by putting objects on a temporary list).

That way if you have some big objects and a highly fragmented vram you
don't end up eviction a big load of data, but just a perfectly-sized hole.
All the scanning is linar, but ime with the implementation in i915.ko
that's not a real-world issue really. The drm_mm roaster supports all the
same features as the normal block allocator, so range-restricted
allocations (and everything else) also works. See evict_something in
i915_gem_eviction.c for how it all works (yeah, no docs but writing those
for drm_mm.c is on my todo somewhere).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch