[cairo] cairo+pixman profiling
Jeff Muizelaar
jeff at infidigm.net
Tue Nov 6 05:54:51 PST 2007
On Tue, Nov 06, 2007 at 05:44:00PM +0900, Nguyen Vu Hung wrote:
> 2007/11/6, Jeff Muizelaar <jeff at infidigm.net>:
> > On Tue, Nov 06, 2007 at 01:03:43PM +0900, Nguyen Vu Hung wrote:
> > > 2007/11/6, Jeff Muizelaar <jeff at infidigm.net>:
> > > > These are called when using the software rasterizer. The ideal solution
> > > > to this problem is replacing the rasterizer. However, that is likely
> > > > quite a bit of work. There has been discussion about possible
> > > > implementations but no real conclusions were ever made.
> > > >
> > > You are talking about the problems of "software rasterizer".Do you
> > > know any solutions for "hardware rasterizer"? Is that GPU makes
> > > rasterization faster?
> >
> > cairo's software rasterizer is a bad software rasterizer. It is possible
> > to do much better in software. This thread has a bunch of discussion
> > about it:
> > http://lists.cairographics.org/archives/cairo/2007-July/011092.html
> Worth's post is worth reading:
> http://lists.cairographics.org/archives/cairo/2007-July/011116.html
> Summay: The best-known algorithm has complexity of O(Nlog N) and the
> algorithm that cairo is using has the complexity of O(Nlog(N+k)). He
> agreed that "many
> things that could be improved here".
>
> If what Worth said is right, i.e. O(Nlog N) is the best known algirthm
> and cairo's complexity is O(Nlog(N+k)), then your "make perf" has
> shown a - possibly - failed test data. I mean, it only holds in your
> case, but not every cases.
This complexity refers to tesselation which isn't really that meaningful
when talking about most software rasterizers because they do not
have a tesselation phase.
>
> Do you have a patch or code that prove what you were saying?
Nope.
>
> > > > > 2. Where unpremultiply_data is callled? What this function does?
> > > >
> > > > Probably in the png writing code. It does c*255/alpha on each component
> > > > of a pixel.
> > > Thank you. I don't think this bottleneck be be overcome because I need
> > > to write PNG files ( with alpha channel ).
> >
> > It is possible to have a faster software implementation of
> > unpremultiply. You can probably at least double the speed. The basic
> > trick involves multiplying by the inverse instead of dividing. You can
> > either computer the inverse at runtime or use a lookup table. I have
> > some code for this, but I haven't touched it for a while so I'd need to
> > look it over if I was to post it to the list.
>
> Do you have a patch or reference code?
Here's something. I don't know if it works or even if it it is faster.
void unp_5(uint32_t *input, int n)
{
for (; n >= 1; n -= 1) {
uint32_t in = *input;
uint32_t a = in >> 24;
if (a == 0) {
*input = 0;
} else {
uint32_t r = in >> 16 & 0xff;
uint32_t g = in >> 8 & 0xff;
uint32_t b = in >> 0 & 0xff;
if (r > a)
r = a;
if (g > a)
g = a;
if (b > a)
b = a;
//int l = ceil(log2(a));
int l = ilog2(a) + 1;
unsigned int mprime = (((1<<16)*((1<<l)-a))/a) + 1;
/* these are basically a special cases for l=0 (d=1)
* we can avoid the tests if we just test for d=1
* and avoid the division */
int sh1 = MIN(l, 1);
int sh2 = MAX(l - 1, 0);
unsigned int t1;
t1 = (r*255*mprime) >> 16;
r = (t1 + ((r*255 - t1)>>sh1)) >> sh2;
t1 = (g*255*mprime) >> 16;
g = (t1 + ((g*255 - t1)>>sh1)) >> sh2;
t1 = (b*255*mprime) >> 16;
b = (t1 + ((b*255 - t1)>>sh1)) >> sh2;
*input = (a<<24) | (r<<16) | (g<<8) | b;
}
input++;
}
}
>
> The "unpremultiply_data" function convert ARGB -> RGBA. It just
> changes the position of alpha channel to the right ( after pixman's
> format ).
swapping the position of the alpha channel is trivial. It is the
division that is time consuming.
> This takes up 4.23% running time of my application. I wonder
> is there any room for improving this? ( see below code )
>
> --
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 4.23 832.34 56.22 unpremultiply_data
>
> /* Unpremultiplies data and converts native endian ARGB => RGBA bytes */
> static void
> unpremultiply_data (png_structp png, png_row_infop row_info, png_bytep data)
> {
> int i;
>
> for (i = 0; i < row_info->rowbytes; i += 4) {
> uint8_t *b = &data[i];
> uint32_t pixel;
> uint8_t alpha;
>
> memcpy (&pixel, b, sizeof (uint32_t));
> alpha = (pixel & 0xff000000) >> 24;
> if (alpha == 0) {
> b[0] = b[1] = b[2] = b[3] = 0;
> } else {
> b[0] = (((pixel & 0xff0000) >> 16) * 255 + alpha / 2) / alpha;
> b[1] = (((pixel & 0x00ff00) >> 8) * 255 + alpha / 2) / alpha;
> b[2] = (((pixel & 0x0000ff) >> 0) * 255 + alpha / 2) / alpha;
> b[3] = alpha;
> }
> }
> }
>
> --
> Best Regards,
> Nguyen Hung Vu
> vuhung16plus{remove}@gmail.dot.com
> An inquisitive look at Harajuku
> http://www.flickr.com/photos/vuhung/sets/72157600109218238/
More information about the cairo
mailing list