Need advice: Mobile ATI card with 'shared' memory - EXA performance?

Sat Dec 10 04:58:27 PST 2005

On Fri, 2005-12-09 at 01:17 +0000, Alan Swanson wrote:
> On Thu, 2005-12-08 at 02:42 +0100, Francesco Biscani wrote:
> > On Thursday 08 December 2005 02:05, Dan wrote:
> 
> [SNIP]
> 
> > The big slowdowns come when you run out of video memory, and you can realize 
> > it by monitoring X memory usage with xrestop. When the total size of pixmap 
> > buffers approximates your video RAM size you notice great slowdowns, 
> > expecially when switching desktops. With my actual setup (64M+1024x768 at 16bit) 
> > this seldom happens (I dare to say it never happens). If you go up to 32bit 
> > color depth you can trigger the "slow" behaviour much more easily (just open 
> > up many windows). In this case I guess going up to 128MB of RAM could help, 
> > even if I have not tried yet (and I need to purchase another 512M stick of 
> > system ram to go there ;) )
> > 
> > My (ignorant and based upon only experience) conclusion is that in this 
> > specific case the bottleneck is not RAM speed.
> 
> The problem here is probably texture thrashing.
> 
> I've not looked into how EXA caches textures but if it uses DRI/Mesa when
> available, and probably also if it doesn't, then it's using Least Recently
> Used (LRU) caching only. LRU is terrible when there are more textures than
> memory as it will be continually replacing every single texture every
> single frame which is a killer.
> [16Mb cache, 18Mb textures; 18Mb bandwidth per frame]
> 
> A fallback to Most Recently Used (MRU) caching as in John Carmack's .plan
> and on many other sites should be used so that only the last bit of
> memory is used as a scratch pad reducing memory bandwidth used.
> [16Mb cache, 18Mb textures, 4Mb bandwidth per frame.]
> 
> I'd actually been thinking about writing this for Mesa over the Christmas
> holidays as it seems a straightforward (for single head) contribution
> not being a hardware guru myself.

EXA actaully uses neither LRU nor MRU.  It also doesn't use Mesa in any
way.

The LRU issue you've noted is particularly an issue for games and other
similar apps, where you're walking the whole set of textures per frame,
and throwing out the oldest probably means you've thrown out one of the
most likely things to be used next.  Sure.  But MRU would be pretty much
pessimal for what we're doing, though.  The recently used pixmaps are
still the most likely to be used in the future in the context of EXA --
think of your backbuffer pixmap your compmgr is using (it would really
hurt to kick that out!), or a window you just dragged your drop-shadows
over (meanwhile the window containing my panel remains un-redrawn, but
still wasting framebuffer, since it's not involved in any my
window-wiggling).

(Hey, I guess just look at what I'm doing now.  If my framebuffer was
full, and I opened this window and started typing this message, the most
recently used things are the window containing this message, the CPU
meter, and the clock.  We'd be thrashing this message's window in and
out versus the clock and cpu meter, right?  I would expect about a .1
second lag in my typing every .5 seconds (as we dump my email out of fb)
as those other guys update.  That seems pretty dumb.)

And here's why I haven't looked at LRU yet.  In the DRI world, you're
trying to reduce the thrashing of your uploads, and downloads are free.
But while you don't want to kick out too many people who'll just have to
re-upload themselves later, for each allocation marginally you care just
about making space, and kicking everyone out is only slightly more
expensive than kicking out just what you need to make your space.  In
contrast, in EXA we usually have to save the contents of things we kick,
and downloading from framebuffer is at least an order of magnitude
slower than uploading to framebuffer.  So doing extra (wasted) downloads
in order to get your current thing up onto the screen is going to be a
slowdown you can feel.

EXA instead currently uses a set of scoring mechanisms to try to
calculate the most important things to keep in memory based off of past
rendering history.  It's not good.  It's really bad in many ways.  But
I'm pretty sure MRU would be far worse, and I need to write a dumb LRU
implementation to make sure that that's not significantly better.  I've
been kicking around ideas for better scoring, but then I've been kicking
those ideas around for quite some time and not actually doing anything.

-- 
Eric Anholt                                     eta at lclark.edu
http://people.freebsd.org/~anholt/              anholt at FreeBSD.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg/attachments/20051210/b8e068cf/attachment.pgp>