[Glamor] [PATCH 05/15] glamor_fbo: Introduce glamor fbo to manage all the fb/tex.

Sat Jan 21 02:54:22 PST 2012

On Sat, 21 Jan 2012 15:21:21 +0800, zhigang gong <zhigang.gong at gmail.com> wrote:
> On Sat, Jan 21, 2012 at 1:54 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > On Fri, 20 Jan 2012 15:05:18 +0000, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >> On Fri, 20 Jan 2012 22:51:54 +0800, zhigang gong <zhigang.gong at gmail.com> wrote:
> >> > On Fri, Jan 20, 2012 at 10:08 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >> > > On Fri, 20 Jan 2012 16:52:03 +0800, zhigang.gong at linux.intel.com wrote:
> >> > >> From: Zhigang Gong <zhigang.gong at linux.intel.com>
> >> > >>
> >> > >> This is the first patch to implement a fbo/tex pool mechanism which
> >> > >> is like the sna's BO cache list. We firstly need to decopule the
> >> > >> fbo/tex from each pixmap. The new glamor_pixmap_fbo data
> >> > >> structure is for that purpose. It's somehow independent to each
> >> > >> pixmap and can be reused latter by other pixmaps once it's detached
> >> > >> from the current pixmap.
> >> > >
> >> > > I'd had to greatly curtail the maximum cache time in order to prevent
> >> > I found a bug at the previous patchset which will always find exact size.
> >> > and Already fix it and push to my private branch:
> >> > git://people.freedesktop.org/~gongzg/glamor
> >> >
> >> > Would you please have one try on your machine?
> >
> > old: glamor-no-fbo
> > new: glamor-fbo
> > Speedups
> > ========
> > Â xlib Â  Â  Â  Â  Â  Â  grads-heat-map Â 127331.39 -> 1449.00: 87.88x speedup
> > Â xlib Â  Â  Â  Â  Â  Â  Â midori-zoomed Â 4566.85 -> 1534.30: Â 2.98x speedup
> > Â xlib Â  Â  Â  Â  Â  Â  Â  Â  Â evolution Â 3216.08 -> 2684.38: Â 1.20x speedup
> > Â xlib Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  gvim Â 9429.61 -> 7945.96: Â 1.19x speedup
> > Â xlib Â  Â  Â  Â  swfdec-giant-steps Â 4499.96 -> 3951.73: Â 1.14x speedup
> > Â xlib Â  Â  Â  Â  gnome-terminal-vim Â 25417.66 -> 22459.87: Â 1.13x speedup
> > Â xlib Â  Â  Â  Â  Â xfce4-terminal-a1 Â 2659.70 -> 2350.43: Â 1.13x speedup
> > Â xlib Â  Â  Â  firefox-planet-gnome Â 11177.70 -> 9903.54: Â 1.13x speedup
> > Â xlib Â  Â  Â  Â  Â firefox-talos-gfx Â 17092.11 -> 15309.88: Â 1.12x speedup
> > Â xlib Â  Â  Â  Â  Â  Â  Â  Â  Â  Â poppler Â 10233.00 -> 9259.71: Â 1.11x speedup
> > Â xlib Â  Â  Â  Â  Â  firefox-fishbowl Â 39253.61 -> 35797.03: Â 1.10x speedup
> > Â xlib Â  Â  Â  Â  Â  Â  poppler-reseau Â 2544.44 -> 2401.40: Â 1.06x speedup
> >
> > No slowdowns. Â It looks like mesa could also benefit from some tuning of
> > its allocator. The grads-heat-map result is most surprising, before the
> > fbo-pool it is running at 75% cpu utilisation, so where it found the
> > performance from I don't know. Ah, it is too good to be true, it's not
> > rendering a1 surfaces at all.
> > -Chris
> 
> I switched to use fls rather than ffs, and it suprised me, here is the result on
> my machine,
> 
> ffs-256  means order = ffs(size/s56)
> 
> firefox-planet-gnome
> fls-256: 10.3s
> ffs-256: 9.3s
> fls-32:    9.4s
> ffs-32:    9.2s
> 
> The memory usage seems similar to each other, and I always set the max
> expire count
> to100. It's not very easy to understand. For now, I just keep the
> fls-32 as my master
> branch.  May need more investigation on the cache bucket allocation.

I'd track a couple of things cache hit rate and median number of items
in non-empty buckets (along with the count of non-empty buckets). 

With the flexibility to use any larger texture, the cache hit rate is
unlikely to suffer too much with the ffs allotment scheme -- every
bucket is likely to always contain a large texture. So I'd guess what
you are observing are in fact second-order effects that should also be
considered when reusing textures - such as whether the allocation is for
an upload, then the cache should return an idle rather than a busy
texture. At your level of abstraction, that simply means a choice from
either the front or the back of the texture queue, so long as you can
defer the allocation of the texture until you know the likely usage.

So I think what you actually need to do is expand the scope of your
testing until you find a marked difference in performance between
fls/ffs so that the investigation is made easier...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre