[RFC] drm/ttm: add minimum residency constraint for bo eviction

Wed Nov 28 07:58:47 PST 2012

So i spend the day looking at ttm and eviction. The first patch i sent
earlier is i believe something that should be merged. This patch however
is more about discussing if other people are interested in similar mecanism
to be share among driver through ttm. I could otherwise just move its logic
to the radeon driver.

So the idea of this patch is that we don't want to constantly move object
in and out of certain memory pool, mostly VRAM. So it adds a minimum
residency time and no object that have been in the given pool for less
than this residency time can be moved out. It closely solve regression
we are having with radeon since gallium driver change and probably improve
some other workload.

Statistic i gathered on xonotic/realquake showed that we can have as much
as 1GB in each direction (VRAM to system and system to vram) over a second.
So we are obviously not saturating the PCIE bandwidth. Profiling shows that
80-90% of the cost of this eviction is in memory allocation/deallocation for
the system memory (lot of irqlock, and mostly kernel spending time
allocating pages thing 256 000 or more page per second to allocate/deallocate.

I used this WIP patch to gather statistic and play with various combination :
http://people.freedesktop.org/~glisse/0001-TTM-EVICT-WIP.patch

Some numbers with xonotic :
17.369fps stock 3.7 kernel
27.883fps 3.7 kernel + do not preserve caching patch ~ +60%
49.292fps 3.7 kernel + WIP with 500ms residency for all pool and no bo wait
          for eviction
49.258fps 3.7 kernel + WIP with 500ms residency for all pool and bo wait
48.213fps 3.7 kernel always allowing GTT placement (basicly revent the
          gallium patch effect)

Other design i am thinking of is changing the way radeon handle it's memory
and stop trying to revalidate object to different memory pool at each cs,
instead i think we should keep a vram lru list probably per process and move
bo out of vram according to this lru and following some euristic. So radeon
would only move bo into vram when there is room.

Other improvement i am thinking of is to reuse GTT memory of object that are
moved in for object that are evicted as statistic i gathered showed that it's
often close amount that move in and out. But this would require true dma
as it would mean scheduling in/out move on page granularity or group of
page (write 4 pages from vram to scratch 4pages into sys, write 4 pages of
system memory bo to vram 4 pages, write 4pages of vram to the just moved
4pages of system memory ...).

Cheers,
Jerome