[Intel-gfx] i915_gem_evict_something in sysprof trace using VBOs

Fri Nov 5 12:44:14 CET 2010

On Fri, 2010-11-05 at 10:35 +0000, Chris Wilson wrote:
> On Fri, 05 Nov 2010 10:21:07 +0000, Peter Clifton <pcjc2 at cam.ac.uk> wrote:
> > I was playing with my VBO code, and noticed this sysprof trace
> > (non-interesting stuff pruned):
> > 
> > drm_ioctl                                         0.13%  56.08%
> >   i915_gem_execbuffer2                            0.00%  32.50%
> >     i915_gem_do_execbuffer                        0.08%  32.50%
> >       i915_gem_object_pin                         0.00%  17.47%
> >         i915_gem_object_bind_to_gtt               0.03%  17.44%
> >           i915_gem_evict_something                0.00%  15.54%
> >             i915_gem_object_unbind                0.00%  15.31%
> >               i915_gem_object_set_to_cpu_domain   0.00%  13.33%
> >                 i915_gem_clflush_object           0.00%  13.33%
> >       i915_gem_clflush_object                     0.00%  14.29%
> >   i915_gem_mmap_gtt_ioctl                         0.00%  10.74%
> >   i915_gem_set_domain_ioctl                       0.00%   4.98%
> > 
> > 
> > The i915_gem_evict_something has me curious. Presumably I have too many
> > pages of data actively being used by the GPU (or mapped).
> 
> Yes, you are suffering from aperture thrashing. There are a few ways to
> workaround this (1) decrease the size of your working set (reduce texture
> sizes, reuse as many buffers within the aperture as possible)

All VBOs. (One 1.7M VBO actually), but it appears the driver / card is
hanging on to it for a while, so every time I glBufferData (...,
NULL, ...); and glMapBuffer - I get more memory usage. I was expecting
the card to use (say), a handful of copies, but no more.

I guess the CPU got ahead of the GPU in terms of rendering, and it used
up all the aperture space in doing so. In truth, my buffers are RARELY
full, but due to some other (bad) code, needed to be large enough to fit
a particularly complex object in some rare cases.

Having thought about it all now (and read some of the implementation
details in mesa / kernel), I think glBufferSubData should work MUCH
better for my needs.

I take bets its "something I've done wrong", as usually seems to be the
way, but for now - if I just use glBufferSubData to upload changed data
only, I get rendering corruption. It works fine with
LIBGL_ALWAYS_SOFTWARE=1 though, so there is perhaps a small possibility
of a driver bug?

Similarly, if I call glBufferData(..., NULL, ...) before the
glBufferSubData, I get back to bad performance (expected), but rendering
corruption is gone.

Thinking of stupid things I might have done wrong... yes, I did call
glBufferData(..., NULL, ...) once in the case where I re-upload
subsequent times with glBufferSubData(). *Turns out when you
accidentally miss this out, "bad" things happen ;).

All this said, I've discovered docs for glMapBufferRange. With a bit of
extra work to my code (to ensure I use as much of a buffer as possible
before scrapping the whole thing), I think this could be my friend for
getting decent performance out of VBOs without having to glBufferSubData
each set of new data.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)
Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me)