[PATCH] drm/ttm: add minimum residency constraint for bo eviction

Fri Nov 30 00:38:41 PST 2012

On 11/29/2012 10:58 PM, Marek Olšák wrote:
> On Thu, Nov 29, 2012 at 9:33 PM, Thomas Hellstrom <thomas at shipmail.org> wrote:
>> On 11/29/2012 01:52 PM, Marek Olšák wrote:
>>> On Thu, Nov 29, 2012 at 9:04 AM, Thomas Hellstrom <thomas at shipmail.org>
>>> wrote:
>>>> On 11/29/2012 03:15 AM, Marek Olšák wrote:
>>>>> On Thu, Nov 29, 2012 at 12:44 AM, Alan Swanson <swanson at ukfsn.org>
>>>>> wrote:
>>>>>> On Wed, 2012-11-28 at 18:24 -0500, Jerome Glisse wrote:
>>>>>>> On Wed, Nov 28, 2012 at 6:18 PM, Thomas Hellstrom
>>>>>>> <thomas at shipmail.org>
>>>>>>> wrote:
>>>>>>>> On 11/28/2012 04:58 PM, j.glisse at gmail.com wrote:
>>>>>>>>> From: Jerome Glisse <jglisse at redhat.com>
>>>>>>>>>
>>>>>>>>> This patch add a minimum residency time configurable for each memory
>>>>>>>>> pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
>>>>>>>>> eviction from VRAM up to a point where the GPU pretty much spend all
>>>>>>>>> it's time moving things in and out.
>>>>>>>>
>>>>>>>> This patch seems odd to me.
>>>>>>>>
>>>>>>>> It seems the net effect is to refuse evictions from VRAM and make
>>>>>>>> buffers go
>>>>>>>> somewhere else, and that makes things faster?
>>>>>>>>
>>>>>>>> Why don't they go there in the first place instead of trying to force
>>>>>>>> them
>>>>>>>> into VRAM,
>>>>>>>> when VRAM is full?
>>>>>>>>
>>>>>>>> /Thomas
>>>>>>> It's mostly a side effect of cs and validating with each cs, if boA is
>>>>>>> in cs1 and not in cs2 and boB is in cs1 but not in cs2 than boA could
>>>>>>> be evicted by cs2 and boB moved in, if next cs ie cs3 is like cs1 then
>>>>>>> boA move back again and boB is evicted, then you get cs4 which
>>>>>>> reference boB but not boA, boA get evicted and boB move in ... So ttm
>>>>>>> just spend its time doing eviction but he doing so because it's ask by
>>>>>>> the driver to do so. Note that what is costly there is not the bo move
>>>>>>> in itself but the page allocation.
>>>>>>>
>>>>>>> I propose this patch to put a boundary on bo eviction frequency, i
>>>>>>> thought it might help other driver, if you set the residency time to 0
>>>>>>> you get the current behavior, if you don't you enforce a minimum
>>>>>>> residency time which helps driver like radeon. Of course a proper fix
>>>>>>> to the bo eviction for radeon has to be in radeon code and is mostly
>>>>>>> an overhaul of how we validate bo.
>>>>>>>
>>>>>>> But i still believe that this patch has value in itself by allowing
>>>>>>> driver to put a boundary on buffer movement frequency.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jerome
>>>>>> So, a variation on John Carmack's recommendation from 2000 to use MRU,
>>>>>> not LRU, to avoid texture trashing.
>>>>>>
>>>>>>      Mar 07, 2000 - Virtualized video card local memory is The Right
>>>>>> Thing.
>>>>>>      http://floodyberry.com/carmack/johnc_plan_2000.html
>>>>>>
>>>>>> In fact, this was last discussed in 2005 with a patch for a 1 second
>>>>>> stale texture eviction and I (still) wondered why a method it was never
>>>>>> implemented since it was an clear problem.
>>>>> BTW we can send end-of-frame markers to the kernel, which could be
>>>>> used to implement Carmack's algorithm.
>>>>>
>>>>> Marek
>>>>
>>>> It seems to me like Carmack's algorithm is quite specific to the case
>>>> where
>>>> only a single GL client is running?
>>> In theory, we could send context IDs to the kernel as well and modify
>>> the conditional to "If the LRU texture was not needed in the previous
>>> frame of any context".
>>>
>>>
>>>> It also seems like it's designed around the fact that when eviction takes
>>>> place, all buffer objects will be idle. With a
>>>> reasonably filled graphics fifo / ring, blindly using MRU will cause the
>>>> GPU
>>>> to run synchronized.
>>> I don't see why you would need to synchronize. If the GPU takes care
>>> of moving buffers in and out of VRAM and there's only one ring buffer
>>> ==> no synchronization is required.
>> The LRU bo has a much higher probability of being idle than the MRU bo, and
>> waiting for it to become idle will in
>> principle synchronize the GPU and unnecessarily drain the ring.
> What I tried to point out was that the synchronization shouldn't be
> needed, because the CPU shouldn't do anything with the contents of
> evicted buffers. The GPU moves the buffers, not the CPU. What does the
> CPU do besides updating some kernel structures?
>
> Also, buffer deletion is something where you don't need to wait for
> the buffer to become idle if you know the memory area won't be
> mapped by the CPU, ever. The memory can be reclaimed right away. It
> would be the GPU to move new data in and once that happens, the old
> buffer will be trivially idle, because single-ring GPUs execute
> commands in order.

Yes, you're right. Sorry about that.

/Thomas