[Intel-gfx] [ANNOUNCE] xf86-video-intel 2.8.0

Eric Anholt eric at anholt.net
Mon Aug 3 18:13:22 CEST 2009


On Fri, 2009-07-31 at 16:03 +0100, Barry Scott wrote:
> Carl Worth wrote:
> > On Fri, 2009-07-31 at 10:10 +0300, Timo Jyrinki wrote:
> >   
> >> Hi. Thanks for the release. It seems to be now quite stable after all
> >> the KMS/GEM/DRI2/UXA hassle, I'm happy to say (and so are / will be
> >> the other users).
> >>     
> >
> > You're quite welcome. I'm happy to hear that the driver is performing
> > well for you.
> >
> >   
> >> However, Intel is getting its worth of bad publicity because of all
> >> the stability problems (now addressed) and performance problems (only
> >> getting worse). Could there be at least some blog post analysis about
> >> what's going to be done about performance?
> >>     
> >
> > I did recently make a blog post about performance measurement at least:
> >
> > 	http://cworth.org/intel/performance_measurement/
> >
> > The point I make there is that microbenchmarks like gtkperf don't really
> > tell us what happens with real applications.
> >
> >   
> >> Newest Phoronix numbers just in:
> >> http://www.phoronix.com/scan.php?page=article&item=intel_q309_flakes&num=2
> >> - Ubuntu 9.10 (2.6.31, 2.8.0, 7.5) is much slower, even double in some
> >> tests, than Ubuntu 9.04 (2.6.28, 2.6.x (EXA), 7.4). And Ubuntu 9.04,
> >> where GEM was introduced, was already 2x slower than Ubuntu 8.10/8.04
> >> (intel 2.2-2.4 / EXA / no GEM) in many important areas.
> >>     
> >
> > What I see there is lots of gtkperf microbenchmarks, which as I put
> > forth in the blog report, don't capture realistic application behavior.
> > So there may or may not be any real performance problem based on those
> > numbers. It's really hard to know.
> >
> > I'm not really familiar with Qgears nor JXRenderMark but they also sound
> > like microbenchmarks, (with names like "Transformed Blit Bilinear").
> >
> > I do appreciate that people are performing benchmark measurements on our
> > various releases, but I'd much rather see a result like:
> >
> > 	"Firefox is 20% slower with 2.8.0 compared to 2.6.3" or so.
> >
> > That's something that would indicate an actually significant performance
> > problem and would be much more compelling to result in work to fix it.
> > And the cairo-perf-trace tool I describe in the blog post above provides
> > results exactly like that, (for firefox, gnome-terminal, evince, etc.
> > and anyone can generate traces for any other cairo-based applications).
> >
> > And of course, I'd even much rather see reports from cairo-perf-trace
> > showing performance improvements rather than slowdowns. :-)
> >
> > Either way, I will look forward to measurements based on real-world
> > loads. And I can say that with the huge architectural reworks behind us,
> > and some serious stabilization done, performance issues are very near
> > the top of the list for several of us working on Intel graphics drivers.
> >
> >   
> What we see is glxSwapBuffers cannot run at 60 frames a second in the 
> real world.
> It drops between 30% and 50% of frames, making smooth animation impossible
> in a compositing environment.
> 
>  From what we can see this is down to the GPU being throttled and not lack
> of CPU or GPU resources.
> 
> What we think is the cause is this: The GPU will be programmed to wait until
> the scan out is outside of the viewable area before the glxSwapBuffers 
> can complete.
> Because the GPU is doing this any drawing to other windows are blocked until
> the swap completed. Now if what has been blocked is the creation of the 
> next frame
> it leaves a very small amount of time before the next frame needs to be 
> drawn.
> 
> You see the same problem with the XV "tear free movie" feature. The GPU 
> spends
> most of the time waiting for the scan line to get outside of the 
> viewable area.
> 
> As yet we have not finished reading the GPU docs to be sure of what the 
> hardware
> can do to avoid this single threading GPU use. However it seems that the 
> hardware
> can be programmed to run batch buffers in parallel. It would seem that 
> something
> like this is required to fix the class of performance problem that we see.

It sounds like your problem is a little more complicated than what
you're describing.  If you've got just one app, it can spend just under
a frame's worth of time preparing a new frame, and just spend the
remainder of that frame waiting for the GPU to be outside of vblank to
do the swap.  It's working great for full-screen games.

But your environment has two applications: a compositing manager that
just blits and waits for vblank (almost no time preparing its frame),
and a movie player that actually needs to get some work done to prepare
its frame.  So, depending on who wakes up when, your movie player
doesn't get to draw its frame in time because the compositor woke up,
drew something cheap, and sent the GPU to sleep waiting to blit it.  I'd
expect stuttering in this environment currently.

We would certainly love to keep the GPU running flat out, but right now
we have no tearing and we're going to stick with that until we get
something better in.  jbarnes and krh at least have been working on
adding DRI2 pageflipping support, so if your compositor uses SwapBuffers
as opposed to CopySubBuffer (the previous recommendation for
compositors, sorry for changing our minds), the GPU will be able to keep
running because the page flip happens independently of the ringbuffer
operation.  I'm guessing this will be in the 2.6.32 timeframe.

-- 
Eric Anholt
eric at anholt.net                         eric.anholt at intel.com


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20090803/949457de/attachment.sig>


More information about the Intel-gfx mailing list