[cairo] cairo+pixman profiling

Jeff Muizelaar jeff at infidigm.net
Tue Nov 6 05:54:51 PST 2007


On Tue, Nov 06, 2007 at 05:44:00PM +0900, Nguyen Vu Hung wrote:
> 2007/11/6, Jeff Muizelaar <jeff at infidigm.net>:
> > On Tue, Nov 06, 2007 at 01:03:43PM +0900, Nguyen Vu Hung wrote:
> > > 2007/11/6, Jeff Muizelaar <jeff at infidigm.net>:
> > > > These are called when using the software rasterizer. The ideal solution
> > > > to this problem is replacing the rasterizer. However, that is likely
> > > > quite a bit of work. There has been discussion about possible
> > > > implementations but no real conclusions were ever made.
> > > >
> > > You are talking about the problems of "software rasterizer".Do you
> > > know any solutions for "hardware rasterizer"? Is that GPU makes
> > > rasterization faster?
> >
> > cairo's software rasterizer is a bad software rasterizer. It is possible
> > to do much better in software. This thread has a bunch of discussion
> > about it:
> > http://lists.cairographics.org/archives/cairo/2007-July/011092.html
> Worth's post is worth reading:
> http://lists.cairographics.org/archives/cairo/2007-July/011116.html
> Summay: The best-known algorithm has complexity of O(Nlog N) and the
> algorithm that cairo is using has the complexity of O(Nlog(N+k)). He
> agreed that "many
> things that could be improved here".
> 
> If what Worth said is right, i.e. O(Nlog N) is the best known algirthm
> and cairo's complexity is O(Nlog(N+k)), then your "make perf" has
> shown a - possibly - failed test data. I mean, it only holds in your
> case, but not every cases.

This complexity refers to tesselation which isn't really that meaningful
when talking about most software rasterizers because they do not
have a tesselation phase.

> 
> Do you have a patch or code that prove what you were saying?

Nope.
 
> 
> > > > > 2. Where unpremultiply_data is callled? What this function does?
> > > >
> > > > Probably in the png writing code. It does c*255/alpha on each component
> > > > of a pixel.
> > > Thank you. I don't think this bottleneck be be overcome because I need
> > > to write PNG files ( with alpha channel ).
> >
> > It is possible to have a faster software implementation of
> > unpremultiply. You can probably at least double the speed. The basic
> > trick involves multiplying by the inverse instead of dividing. You can
> > either computer the inverse at runtime or use a lookup table. I have
> > some code for this, but I haven't touched it for a while so I'd need to
> > look it over if I was to post it to the list.
> 
> Do you have a patch or reference code?

Here's something. I don't know if it works or even if it it is faster.

void unp_5(uint32_t *input, int n)
{
        for (; n >= 1; n -= 1) {
                uint32_t in = *input;
                uint32_t a = in >> 24;
                if (a == 0) {
                        *input = 0;
                } else {
                        uint32_t r = in >> 16 & 0xff;
                        uint32_t g = in >> 8  & 0xff;
                        uint32_t b = in >> 0  & 0xff;
                        if (r > a)
                                r = a;
                        if (g > a)
                                g = a;
                        if (b > a)
                                b = a;
                        //int l = ceil(log2(a));
                        int l = ilog2(a) + 1;
                        unsigned int mprime = (((1<<16)*((1<<l)-a))/a) + 1;
                        /* these are basically a special cases for l=0 (d=1)
                         * we can avoid the tests if we just test for d=1
                         * and avoid the division */
                        int sh1 = MIN(l, 1);
                        int sh2 = MAX(l - 1, 0);

                        unsigned int t1;
                        t1 = (r*255*mprime) >> 16;
                        r = (t1 + ((r*255 - t1)>>sh1)) >> sh2;

                        t1 = (g*255*mprime) >> 16;
                        g = (t1 + ((g*255 - t1)>>sh1)) >> sh2;

                        t1 = (b*255*mprime) >> 16;
                        b = (t1 + ((b*255 - t1)>>sh1)) >> sh2;
                        *input = (a<<24) | (r<<16) | (g<<8) | b;
                }
                input++;
        }
}


> 
> The "unpremultiply_data" function convert ARGB -> RGBA. It just
> changes the position of alpha channel to the right ( after pixman's
> format ). 

swapping the position of the alpha channel is trivial. It is the
division that is time consuming.

> This takes up 4.23% running time of my application. I wonder
> is there any room for improving this? ( see below code )
> 
> --
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  4.23    832.34    56.22                             unpremultiply_data
> 
> /* Unpremultiplies data and converts native endian ARGB => RGBA bytes */
> static void
> unpremultiply_data (png_structp png, png_row_infop row_info, png_bytep data)
> {
>     int i;
> 
>     for (i = 0; i < row_info->rowbytes; i += 4) {
>         uint8_t *b = &data[i];
>         uint32_t pixel;
>         uint8_t  alpha;
> 
> 	memcpy (&pixel, b, sizeof (uint32_t));
> 	alpha = (pixel & 0xff000000) >> 24;
>         if (alpha == 0) {
> 	    b[0] = b[1] = b[2] = b[3] = 0;
> 	} else {
>             b[0] = (((pixel & 0xff0000) >> 16) * 255 + alpha / 2) / alpha;
>             b[1] = (((pixel & 0x00ff00) >>  8) * 255 + alpha / 2) / alpha;
>             b[2] = (((pixel & 0x0000ff) >>  0) * 255 + alpha / 2) / alpha;
> 	    b[3] = alpha;
> 	}
>     }
> }
> 
> -- 
> Best Regards,
> Nguyen Hung Vu
> vuhung16plus{remove}@gmail.dot.com
> An inquisitive look at Harajuku
> http://www.flickr.com/photos/vuhung/sets/72157600109218238/


More information about the cairo mailing list