[cairo] Gradient mesh rasterizer

Tue Jul 14 15:49:02 PDT 2009

On Tue, Jul 14, 2009 at 9:56 PM, Jeff Muizelaar<jeff at infidigm.net> wrote:
> On Tue, Jul 14, 2009 at 02:12:34PM +0200, Andrea Canciani wrote:
>> I'm working on extending cairo patterns to handle tensor product
>> shading patterns (type 7 in pdf reference).
>> Right now I can rasterize them, but my code isn't fast and clean as it
>> should, anyway Adrian Johnson suggested me to post it here as there
>> are performance expert reading this list.
>> In the comments I tried to point out known performance problems and
>> possible solutions.
>> The code should be linked to cairo and libm. Please pay attantion when
>> running it because it overwrites the file "mesh.png" in the current
>> directory with a rendering of a gradient mesh (hardcoded in the main
>> function).
>
> I'm not sure if you planned to do this anyways, but the following patch
> brings the runtime on my machine from 0.430s to .358s. The interesting
> part is moving the multiplications used to compute ca out of the
> innermost loop. You could do something similar for cr,cg,cb.
Nice! You're right, I should probably completely avoid multiplications
in the inner loop (expecially in integer arithmetic an addition is
usually much faster than a multiplication).
>
> I expect that the double to integer conversions are expensive and, once
> the multiplications are eliminated from the inner loop, will dominate
> the time spent in that loop. Therefore, it might be worth while to
> investigate using an integer fixed point representation for the inner
> loop computations.
As stated in the comments, I still have to test integer arithmetic
performance, but I expect that it should be much better than floating
point.
I'm working towards an implementation using integers for all the
u-related data, i.e. everything in the inner loop (colors,
coordinates, curve coefficients), but I need to fix tesselation before
(I need to make sure the size of the tile doesn't exceed the precision
I'm using, expecially if the rasterizer doesn't use double-precision
floats).
I think I found how to correctly tesselate along the u coordinate, but
the implementation is not as simple as I had foreseen. Hopefully I'll
get it right in the next few days.
Unless any problem is found (well, except performance), the file I
posted should probably be the reference implementation as it is a
straightforward implementation of a simple algorithm.
I tried to look at the output by Acrobat Reader and Apple Preview and
both are inferior in quality, at least looking at the rendering of
plate 15 of "PDF Reference":
- Acrobat Reader tesselates the gradients to triangles that can be
spotted in the unit square if you look carefully. Using image-editing
tools they can easily be shown on both images
- Preview shows a weird discontinuity in the gradient in all the
corners of the shading patterns

By the way, you're correct when you say that the cost of color
computation is quite high right now: applying this patch my time goes
down to 0.224s from 0.321s and it simply skips some pixel overwriting.

--- pattern.c~	2009-07-14 22:23:14.000000000 +0200
+++ pattern.c	2009-07-15 00:15:27.000000000 +0200
@@ -102,6 +102,8 @@
 {
   bezier_t xv[4][4], yv[4][4], xu[4], yu[4], tmp;
   int vsteps, usteps, vshift, ushift, v, u, i, k, x, y;
+  int oldx, oldy;
+  oldx = oldy = -1;

   tmp = 0;
   for (i = 0; i < 4; ++i) {
@@ -144,7 +146,7 @@
     for (u = 0; u < usteps; ++u) {
       x = (int) xu[0];
       y = (int) yu[0];
-      if (0 <= x && 0 <= y && x < width && y < height) {
+      if ((oldx != x || oldy != y) && 0 <= x && 0 <= y && x < width
&& y < height) {
 	double ca = ( (vsteps - v) * ( (usteps - u) * a[0] + u * a[1]) + v *
( (usteps - u) * a[2] + u * a[3])) * tmp * 255.0 + .5;
 	int cr = ( (vsteps - v) * ( (usteps - u) * r[0] + u * r[1]) + v * (
(usteps - u) * r[2] + u * r[3])) * tmp * ca;
 	int cg = ( (vsteps - v) * ( (usteps - u) * g[0] + u * g[1]) + v * (
(usteps - u) * g[2] + u * g[3])) * tmp * ca;
@@ -152,6 +154,8 @@
 	int ica = ca;
 	* ( (uint32_t*) (data + y*stride + 4*x)) = (ica << 24) | (cr << 16)
| (cg << 8) | cb;
       }
+      oldx = x;
+      oldy = y;
       fwd (xu);
       fwd (yu);
     }