Performance change from X in Fedora Core 4 to Fedora Core 5

Fri Jul 14 04:01:40 PDT 2006

On Fri, 2006-07-14 at 08:28 +0900, Carsten Haitzler wrote:
> ok - if your content is TEXT it will be hard to read - but lets say its a slow
> website - and it is loading. you can watch it load in a corner of your screen
> while you go do something more useful. invariably an app will give some hint as
> to its state without needing to see all the details. even text filled xterms -
> if you have a compile going and its scrolling along. once it STOPS scrolling
> you know the compile is done (if the compile is on a remote box over ssh you
> wont be using your local cpu load meter to tell either). anyway - my point is -
> people ask for and have uses for such technology and xcomposite makes it all
> (finally) possible.

I do not think that compositing is ever going to deliver user
expectations if they set them so high. Supporting all existing
applications does mean giving them access to the framebuffer
environments that they were written to use. This is expensive. Giving
them access to those framebuffers even when the applications are
minimised is even more expensive. Minifying those framebuffers as
textures is very expensive, especially without mipmaps. Put the whole
lot together and you are asking much more of the GPU than it can
possinly deliver.

If users want to see minified versions of their apps at all times then
those apps are going to have to be written from the bottom up to make
full use of the capabilities of the GPU. There is no other choice.

> you will want to accelerate drawing as the apps wont stop
> rendering - they will draw like they do when fully visible. if you use software
> fallbacks or pixmap thrash with them - then your xserver will just consume most
> of your cpu with these other windows.... not a good thing. my point is that you
> still want to reduce unneeded pixmap usage where and when you can.

> > Compositing using pixel perfect framebuffers for each application just
> > to shrink them to nothing would indeed be extremely expensive, but
> > speeding up the drawing into those framebuffers would be a rather feeble
> > gesture towards efficiency. Pixel operations can not be performed
> > remotely efficiently at entirely the wrong scale. You would be much
> > better off switching to a scalable drawing API like cairo for _this_
> > kind of work. 
> 
> not going to happen - you plan on making every app rescale its own output. also
> just because cairo can draw vectors - doesn't mean it trivially makes such a
> thing work out of the box everywhere just by its use. problem is you  can't
> just switch all the apps - they come as they come with their various toolkits
> or even DIY drawing. a scalable drawing kit doesnt suddenly make blit
> operations work properly when scaled down or for that matter most operations.
> it isn't so simple. the only sane solution is to let the app draw to its full
> sized window as a pixmap ans post-scale to the icon version.

I had no intention of implying that a software based drawing kit would
enable users to see minified versions of _existing_ applications. That
is impossible by any means. Not going to happen.

However, it _might_ be possible to minify a new generation of
applications that were designed with minification in mind. These new
apps would need to perform their drawing operations through an API that
accomodated scaling and used the full capabilities of the GPU to achieve
it. OpenGL does scale the app programmers drawing in this way. It uses
radically different drawing operations from 2D graphics to achieve it,
specifying floating point vertices and mipmapped textures rather than
integer coordinates and 2D images. I mentioned cairo as its glitz
backend is the closest bridge from 2D to 3D that is currently available.

Are we going to see a mass migration of 2D apps over to 3D ? You clearly
believe that it is not going to happen and you might well be right.
However, it is the only way that user expectations can possibly be met
if they want the kind of minification that you are envisioning.

>> ...
> > Your approach reduces the total memory that needs to be allocated at any
> > specific time by allowing the toolkit to choose when to buffer and which
> > areas to buffer. However, nvidia seem to be confessing that this memory
> > will currently be alloced and dealloced in RAM and the drawing
> > operations will make little use of the GPU. 
> 
> that is a matter of a bad driver implementation - but i seriously do not
> believe this. i can allocate a pixmap xcopyarea from it to other pixmaps at
> blinding speeds - it's in video ram and it's using the gfx chipset to do the
> blits. it's not done by cpu.

xcopyarea certainly outperforms memcpy by a huge margin and the effect
on xrender speeds of turning off the acceleration is really striking. I
suspect that nvidia usually manage to get the pixmaps into VRAM right
from the start and use non-pipelined 2D hardware to do the blits.

>  the problem is the opengl <-> 2d world mix. in the
> past opengl and 3d have been done so entirely separately and differently that
> it si when you cross the boundaries of these 2 worlds and want to mix and match
> them - you run into the slowest operations and biggest inefficiencies. that is
> a matter of fixing drivers to no longer see both as distinctly different worlds.

The problem is indeed the mix of 3D and 2D. They are different worlds
and getting from one to the other is very difficult. However, this
difference _can not_ be fixed in the drivers and could not even be fixed
in the hardware. OpenGL provides a means of getting from 3D to 2D, but
it has to start from a 3D based drawing API and use 3D specific hardware
to achieve the move to 2D framebuffers at acceptable speeds. Starting
from the 2D drawing API with which we are familiar provides a very
efficient direct route to a 2D framebuffer. However, no one has devised
an efficient route from this 2D drawing API to 3D.

I entirely agree that the current 2D drawing API is only compatible with
direct use of a 2D framebuffer, but transforming a continuously changing
2D framebuffer into a scalable 3D texture is a horrendous process. You
can either recalculate the mipmaps continuously or abandon them
completely. Either way scaling will take ages. Even without any scaling
it takes 5 times longer on my hardware to map a static unmipmapped
texture onto the screen than it takes to perform the functionally
identical bit blit. Computer games take ages to set up their textures
before they begin drawing and the GPUs still end up spending most of
their time handling the textures even after they have been mipmapped.

OpenGL divides up the drawing process into vertex transformations,
rasterisation and perfragment operations. The vertex transforms allow
scaling and lighting to be done using efficient vector graphic logic.
The rasterisation works out where every visible pixel will come from on
each polygon and works out its value by transforming the relevent
static, mipmapped texture, ignoring undrawn pixels completely. Finally,
the perfragment operations do the blending, depth testing, etc. on the
final pixels. This is obvioulsy much more efficient as a means of
scaling than starting by drawing a big 2D image, and you have to start
from an API like OpenGL to get there.

Felix