[PATCH] drm/ttm: add minimum residency constraint for bo eviction

Thu Nov 29 13:58:29 PST 2012

On Thu, Nov 29, 2012 at 9:33 PM, Thomas Hellstrom <thomas at shipmail.org> wrote:
> On 11/29/2012 01:52 PM, Marek Olšák wrote:
>>
>> On Thu, Nov 29, 2012 at 9:04 AM, Thomas Hellstrom <thomas at shipmail.org>
>> wrote:
>>>
>>> On 11/29/2012 03:15 AM, Marek Olšák wrote:
>>>>
>>>> On Thu, Nov 29, 2012 at 12:44 AM, Alan Swanson <swanson at ukfsn.org>
>>>> wrote:
>>>>>
>>>>> On Wed, 2012-11-28 at 18:24 -0500, Jerome Glisse wrote:
>>>>>>
>>>>>> On Wed, Nov 28, 2012 at 6:18 PM, Thomas Hellstrom
>>>>>> <thomas at shipmail.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 11/28/2012 04:58 PM, j.glisse at gmail.com wrote:
>>>>>>>>
>>>>>>>> From: Jerome Glisse <jglisse at redhat.com>
>>>>>>>>
>>>>>>>> This patch add a minimum residency time configurable for each memory
>>>>>>>> pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
>>>>>>>> eviction from VRAM up to a point where the GPU pretty much spend all
>>>>>>>> it's time moving things in and out.
>>>>>>>
>>>>>>>
>>>>>>> This patch seems odd to me.
>>>>>>>
>>>>>>> It seems the net effect is to refuse evictions from VRAM and make
>>>>>>> buffers go
>>>>>>> somewhere else, and that makes things faster?
>>>>>>>
>>>>>>> Why don't they go there in the first place instead of trying to force
>>>>>>> them
>>>>>>> into VRAM,
>>>>>>> when VRAM is full?
>>>>>>>
>>>>>>> /Thomas
>>>>>>
>>>>>> It's mostly a side effect of cs and validating with each cs, if boA is
>>>>>> in cs1 and not in cs2 and boB is in cs1 but not in cs2 than boA could
>>>>>> be evicted by cs2 and boB moved in, if next cs ie cs3 is like cs1 then
>>>>>> boA move back again and boB is evicted, then you get cs4 which
>>>>>> reference boB but not boA, boA get evicted and boB move in ... So ttm
>>>>>> just spend its time doing eviction but he doing so because it's ask by
>>>>>> the driver to do so. Note that what is costly there is not the bo move
>>>>>> in itself but the page allocation.
>>>>>>
>>>>>> I propose this patch to put a boundary on bo eviction frequency, i
>>>>>> thought it might help other driver, if you set the residency time to 0
>>>>>> you get the current behavior, if you don't you enforce a minimum
>>>>>> residency time which helps driver like radeon. Of course a proper fix
>>>>>> to the bo eviction for radeon has to be in radeon code and is mostly
>>>>>> an overhaul of how we validate bo.
>>>>>>
>>>>>> But i still believe that this patch has value in itself by allowing
>>>>>> driver to put a boundary on buffer movement frequency.
>>>>>>
>>>>>> Cheers,
>>>>>> Jerome
>>>>>
>>>>> So, a variation on John Carmack's recommendation from 2000 to use MRU,
>>>>> not LRU, to avoid texture trashing.
>>>>>
>>>>>     Mar 07, 2000 - Virtualized video card local memory is The Right
>>>>> Thing.
>>>>>     http://floodyberry.com/carmack/johnc_plan_2000.html
>>>>>
>>>>> In fact, this was last discussed in 2005 with a patch for a 1 second
>>>>> stale texture eviction and I (still) wondered why a method it was never
>>>>> implemented since it was an clear problem.
>>>>
>>>> BTW we can send end-of-frame markers to the kernel, which could be
>>>> used to implement Carmack's algorithm.
>>>>
>>>> Marek
>>>
>>>
>>> It seems to me like Carmack's algorithm is quite specific to the case
>>> where
>>> only a single GL client is running?
>>
>> In theory, we could send context IDs to the kernel as well and modify
>> the conditional to "If the LRU texture was not needed in the previous
>> frame of any context".
>>
>>
>>> It also seems like it's designed around the fact that when eviction takes
>>> place, all buffer objects will be idle. With a
>>> reasonably filled graphics fifo / ring, blindly using MRU will cause the
>>> GPU
>>> to run synchronized.
>>
>> I don't see why you would need to synchronize. If the GPU takes care
>> of moving buffers in and out of VRAM and there's only one ring buffer
>> ==> no synchronization is required.
>
> The LRU bo has a much higher probability of being idle than the MRU bo, and
> waiting for it to become idle will in
> principle synchronize the GPU and unnecessarily drain the ring.

What I tried to point out was that the synchronization shouldn't be
needed, because the CPU shouldn't do anything with the contents of
evicted buffers. The GPU moves the buffers, not the CPU. What does the
CPU do besides updating some kernel structures?

Also, buffer deletion is something where you don't need to wait for
the buffer to become idle if you know the memory area won't be
mapped by the CPU, ever. The memory can be reclaimed right away. It
would be the GPU to move new data in and once that happens, the old
buffer will be trivially idle, because single-ring GPUs execute
commands in order.

Marek