[PATCH 4/6] drm/radeon: add buffers to the LRU list from smallest to largest

Tue Mar 4 05:27:31 PST 2014

Hi, Marek!

I'm OOTO this week, so I can't take a closer look until next week but
there are a couple of things that worries me about this.

1) It's optimizing a slowpath (memory pressure) at the cost of making a
fastpath slower by adding a sorting pass.
2) While it might make a difference on VRAM, it will not help much on
fragmentation-free memory types, like vmwgfx GMR and MOB memory types.
3) It seems like the mechanism by which it works is to reduce the amount
of memory evicted "in vain", (buffers that do not contribute to the
final free memory area that's used). If that's really the case, there
are better ways to do this (walk the LRU list pre-determining what
buffers need to be evicted like the intel driver does).

So considering this, I think we should make this sorting pass optional.

/Thomas

On 03/01/2014 10:57 PM, Marek Olšák wrote:
> Only the VMWare driver uses ttm_eu_fence_buffer_objects. Cc'ing
> Thomas. What do you think about this, Thomas?
>
> Marek
>
> On Thu, Feb 27, 2014 at 2:22 AM, Michel Dänzer <michel at daenzer.net> wrote:
>> On Mit, 2014-02-26 at 19:25 +0100, Marek Olšák wrote:
>>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>>> index f28a8d8..d49a3f7 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>>> [...]
>>> @@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>>>       unsigned i;
>>>
>>>       if (!error) {
>>> +             /* Sort the buffer list from the smallest to largest buffer,
>>> +              * which affects the order of buffers in the LRU list.
>>> +              * This assures that the smallest buffers are added first
>>> +              * to the LRU list, so they are likely to be later evicted
>>> +              * first, instead of large buffers whose eviction is more
>>> +              * expensive.
>>> +              *
>>> +              * This slightly lowers the number of bytes moved by TTM
>>> +              * per frame under memory pressure.
>>> +              */
>>> +             list_sort(NULL, &parser->validated, cmp_size_smaller_first);
>>> +
>>>               ttm_eu_fence_buffer_objects(&parser->ticket,
>>>                                           &parser->validated,
>>>                                           parser->ib.fence);
>> This seems like a good idea in general, so maybe it should be done in
>> ttm_eu_fence_buffer_objects() instead, and possibly also in the callers
>> of ttm_eu_backoff_reservation_locked().
>>
>>
>> --
>> Earthling Michel Dänzer            |                  https://urldefense.proofpoint.com/v1/url?u=http://www.amd.com/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=aQIiktj8ewwEsYip5R%2FcVgKb30vgbPORPhMChqLpKg0%3D%0A&s=ccc52fd17cadd5c2a056880e0fc457d6e7258e119ac5fc4545a0aea4b4a0aa34
>> Libre software enthusiast          |                Mesa and X developer
>>