TTM's role in score-based eviction

Mon Dec 9 11:32:54 PST 2013

On 12/09/2013 06:28 PM, Daniel Vetter wrote:
> On Thu, Dec 05, 2013 at 11:45:03AM -0500, Jerome Glisse wrote:
>> On Thu, Dec 05, 2013 at 05:22:54PM +0100, Maarten Lankhorst wrote:
>>> op 05-12-13 16:49, Jerome Glisse schreef:
>>>> On Thu, Dec 05, 2013 at 11:26:46AM +0100, Thomas Hellstrom wrote:
>>>>> Hi!
>>>>>
>>>>> On 12/05/2013 10:36 AM, Lauri Kasanen wrote:
>>>>>> Hi list, Thomas,
>>>>>>
>>>>>> I will be investigating the use of a hotness score for each bo, to
>>>>>> replace the ping-pong causing LRU eviction in radeon*.
>>>>>>
>>>>>> The goal is to put all bos that fit in VRAM there, in order of hotness;
>>>>>> a new bo should only be placed there if its hotness score is greater
>>>>>> than the lowest VRAM bo's. Then the lowest-hotness-bos in
>>>>>> VRAM should be evicted until the new bo fits. This should result in a
>>>>>> more stable set with less ping-pong.
>>>>>>
>>>>>> Jerome advised that the bo placement should be done entirely outside
>>>>>> TTM. As I'm not (yet) too familiar with that side of the kernel, what is
>>>>>> the opinion of TTM folks?
>>>>> There are a couple of things to be considered:
>>>>> 1) You need to decide where a bo to be validated should be placed.
>>>>> The driver can give a list of possible placements to TTM and let
>>>>> TTM decide, trying each placement in turn. A driver that thinks this
>>>>> isn't sufficient can come up with its on strategy and give only a
>>>>> single placement to TTM. If TTM can't satisfy that, it will give you
>>>>> an error back, and the driver will need to validate with an
>>>>> alternative placement. I think Radeon already does this? vmwgfx does
>>>>> it to some extent.
>>>>>
>>>>> 2) As you say, TTM is evicting strictly on an lru basis, and is
>>>>> maintaining one LRU list per memory type, and also a global swap lru
>>>>> list for buffers that are backed by system pages (not VRAM). I guess
>>>>> what you would want to do is to replace the VRAM lru list with a
>>>>> priority queue where bos are continously sorted based on hotness.
>>>>> As long as you obey the locking rules:
>>>>> *) Locking order is bo::reserve -> lru-lock
>>>>> *) When walking the queue with the lru-lock held, you must therefore
>>>>> tryreserve if you want to reserve an object on the queue
>>>>> *) bo:s need to be removed from the queue as soon as they are reserved
>>>>> *) Don't remove a bo from the queue unless it is reserved
>>>>> Nothing stops you from doing this in the driver, but OTOH if this
>>>>> ends up being useful for other drivers I'd prefer we put it into
>>>>> TTM.
>>>> It will be useful to others, the point i am making is that others might
>>>> not use ttm either and there is nothing about bo placement that needs
>>>> to be ttm specific.
>>>>
>>>> To avoid bo eviction from lru list is just a matter of driver never
>>>> over committing bo on a pool of memory and driver doing eviction by
>>>> itself, ie deciding of a new placement for bo and moving that bo
>>>> before moving in other bo, which can be done outside ttm.
>>>>
>>>> The only thing that will needs modification to ttm is work done to
>>>> control memory fragmentation but this should be not be enforce on
>>>> all ttm user and should be a runtime decision. GPU with virtual
>>>> address space can scatter bo through vram by using vram pages making
>>>> memory fragmentation pretty much a non issue (some GPU still needs
>>>> contiguous memory for scan out buffer or other specific buffer).
>>>>
>>> You're correct it COULD be done like that, but that's a nasty workaround.
>>> Simply assign a priority to each buffer, then modify ttm_bo_add_to_lru,
>>> ttm_bo_swapout, ttm_mem_evict_first and be done with it.
>>>
>>> Memory management is exactly the kind of thing that should be done in TTM,
>>> so why have something 'generic' for something that's little more than a renamed priority queue?
>> The end score and use of the score for placement decision be done in ttm
>> but the whole score computation and heuristic related to it should not.
> btw another thing to look at is the eviction roaster in drm_mm. It's
> completely standalone, the only thing it requires is that you have a
> deterministic order to add objects to it and unroll them (but that can
> always be solved by putting objects on a temporary list).
>
> That way if you have some big objects and a highly fragmented vram you
> don't end up eviction a big load of data, but just a perfectly-sized hole.
> All the scanning is linar, but ime with the implementation in i915.ko
> that's not a real-world issue really. The drm_mm roaster supports all the
> same features as the normal block allocator, so range-restricted
> allocations (and everything else) also works. See evict_something in
> i915_gem_eviction.c for how it all works (yeah, no docs but writing those
> for drm_mm.c is on my todo somewhere).
> -Daniel

The problem with combining this with TTM is that eviction by default 
doesn't take place under a mutex, so multiple threads may be traversing 
the LRU list more or less at the same time, evicting stuff.

However, when it comes to eviction, that's not really a behaviour we 
need to preserve. It would, IMO, be OK to take a "big" per-memory-type 
mutex around eviction, but then one would have to sort out how / whether 
swapping and delayed destruction would need to wait on that mutex as 
well....

/Thomas