Asynchronous eviction [WAS Re: [PATCH] drm/ttm: add minimum residency constraint for bo eviction]
Thomas Hellstrom
thomas at shipmail.org
Fri Nov 30 01:39:57 PST 2012
On 11/29/2012 10:58 PM, Marek Olšák wrote:
>
> What I tried to point out was that the synchronization shouldn't be
> needed, because the CPU shouldn't do anything with the contents of
> evicted buffers. The GPU moves the buffers, not the CPU. What does the
> CPU do besides updating some kernel structures?
>
> Also, buffer deletion is something where you don't need to wait for
> the buffer to become idle if you know the memory area won't be
> mapped by the CPU, ever. The memory can be reclaimed right away. It
> would be the GPU to move new data in and once that happens, the old
> buffer will be trivially idle, because single-ring GPUs execute
> commands in order.
>
> Marek
Actually asynchronous eviction / deletion is something I have been
prototyping for a while but never gotten around to implement in TTM:
There are a few minor caveats:
With buffer deletion, what you say is true for fixed memory, but not for
TT memory where pages are reclaimed by the system after buffer
destruction. That means that we don't have to wait for idle to free GPU
space, but we need to wait before pages are handed back to the system.
Swapout needs to access the contents of evicted buffers, but
synchronizing doesn't need to happen until just before swapout.
Multi-ring - CPU support: If another ring / engine or the CPU is about
to move in buffer contents to VRAM or a GPU aperture that was previously
evicted by another ring, it needs to sync with that eviction, but
doesn't know what buffer or even which buffers occupied the space
previously. Trivially one can attach a sync object to the memory type
manager that represents the last eviction from that memory type, and
*any* engine (CPU or GPU) that moves buffer contents in needs to order
that movement with respect to that fence. As you say, with a single ring
and no CPU fallbacks, that ordering is a no-op, but any common
(non-driver based) implementation needs to support this.
A single fence attached to the memory type manager is the simplest
solution, but a solution with a fence for each free region in the free
list is also possible. Then TTM needs a driver callback to be able order
fences w r t echother.
/Thomas
More information about the dri-devel
mailing list