[cairo] recursive quad-tree, or bounding box of changed area during drawing?

Thu Jun 1 09:55:37 PDT 2006

On Thu, 2006-06-01 at 02:00 +0200, Leon Woestenberg wrote:
> > However, *nobody ever did any profiling of any of this*.  We don't know
> Any profile would be very system specific.

Yes and no.  The GNOME canvas runs all in software; it's basically
canvas -> libart -> RGB buffer -> XImage -> X server.

What is your intended application, a mostly static display or many/big
moving/changing elements?  What refresh rate do you want to achieve?
Are your shapes complex?  Gradients?  Text?

> But to answer your question, I would think of an approach as follows: 
> Find extreme coords of paths, use power of two sized area in the tree, 
> bitmask the extreme pixel coordinate values. I.e. for 256 size: dirty = 
> (x & 0x7f) left_right=(x & 0x80) for each horizontal halve. Identical 
> for vertical halves, and trascend the tree as such.

Here's what libart and the canvas tried to solve (the complex case), and
why it fails spectacularly for the simple case:

The simple case is that you have a diagonal line, represented here with
".", with a few items around it, represented by "X":

   ...               XXXXX
     ...     XXX       XXXXX
       ...   XXX      XXXX
         ...
     XX    ...     XXXXXX
     XXX     ...    XXXXXX
    XXXX       ...   XXXX
     XXXX        ...  XXXX
       XXX         ...
                     ...
            XXXX       ...
      XXXXXXXXXXXX       ...

If you are dragging the line just a little bit, repainting based on the
line's bounding box means that you also have to repaint the X items.
Libart wanted to make it possible to *just* repaint the area occupied by
the old position of the line, and the new position, and repaint as
little as possible of the surrounding items.

Now here's a more complex case.

Say you have a big unfilled Bézier curve, represented with "." here.
You also have some shapes scattered around, represented with "X":

             .........               XXXX
       ......         ............    XXXX
      .                           ...   XXX
       .     XXX            XX       O  XX
      .       XXXXXXX      XXXX      O
     .          XXXXXX    XXX        O
      .   XXXX     XXX     XXX      O
   XX  ... XXX                    ..
  XXXXX   ......               ...
   XXXXXX       ...............

The "O"s there represent a single segment of the Bézier curve (i.e. if
handles were displayed for the control points, dragging them would only
modify the "O" region and not the rest of the curve).

You drag one of the handles and only the "O" region moves a bit.  The
rest doesn't change.  A *really* well-written canvas item for editing
Bézier curves would be able to see that it only needs to repaint the
area with the "O", both in its old position and its new position.

Libart would then make you do this:

  Bézier path -> Sorted Vector Path (SVP) -> microtile region

In libart's terms, a Bézier path is a postscript-like sequence of
commands (moveto, lineto, curveto, etc.).  Think of a Sorted Vector Path
(SVP) as a polygonized version of a Bézier curve, which in theory is
fast to rasterize.  Then libart uses a fill-polygon algorithm to fill
the microtile array with the area corresponding to the SVP.

So, you take the original Bézier subpath for the old position of the "O"
region, and figure out the microtile region.  You do the same for the
new position.  Then you take these two microtile arrays and compute
their union; the result is the dirty region for the canvas.

When the canvas repaints, it decomposes this microtile array into bigger
rectangles (so that adjacent microtile squares get merged into a
rectangle, for example), and does this:

  list_of_rects = microtile_array_to_rectangles (dirty_region)

  foreach r in list_of_rects:
      recursively scan the tree of items:
          if item.bounding_box intersects r:
               paint (item, r)

This has some problems:

The canvas tried too hard to compute the minimal repaint area.  If you
have a solid rectangle that moves from position A to position B:

  A---------+
  |         |
  |  B---------+
  |  |         |
  +--|         |
     |         |
     +---------+

Then in theory you only need to repaint the two L-shaped areas that are
the symmetric difference of both rectangles, and you don't need to
repaint the smaller rectangle which is the intersection:

  XXXXXXXXXXX
  XXXXXXXXXXX
  XXX        XXX
  XXX        XXX
  XXX        XXX
     XXXXXXXXXXX
     XXXXXXXXXXX

But the canvas and libart try to compute the symmetric difference for
*all* paths, even when they are complex Bézier paths.  This can get
expensive, and is a total bitch to compute analytically:  libart
actually has a polygon intersector, which has taken years to debug and
is still slow.

Once you have that symmetric difference, you have to run a fill-polygon
algorithm to fill the microtile array.  Think of this operation as
painting the polygon on very fat pixels.  This could be expensive.

Then you have to do the unions of all the microtile arrays to get to the
final dirty region.

Then you take the big microtile array and decompose it into bigger
rectangles for repainting.

Then, for each rectangle, you see if it intersects the bounding box of
each item in the canvas (recursive bounding boxes would help you if
applications actually took advantage of them, but they don't ---
everyone uses a shallow hierarchy of items, instead of carefully-nested
deep hierarchies.  This is a big problem for people who want to throw
10000 markers into a graph).

Then you tell each intersected item to paint itself.  This operation can
occur several times, because you do it once per rectangle.  Painting an
item may require some setup work.  In theory libart lets you minimize
this setup work by holding the SVPs around (i.e. by doing the Bézier ->
SVP conversion just once), and hoping that the SVP rasterizer will be
fast.

Since each rectangle is painted separately, the display has a lot of
"tearing":  you don't see it repaint all at once, but rather in small
regions with an unpleasant effect.  This is easy to fix (you use a
bigger temporary pixmap, and use a clipping region), but is something to
keep in mind.

In summary:

- Being smart takes a *lot* of code which is a pain to debug.  Writing a
"really correct" canvas item is a *lot* of work; that's why nobody does
it.

- A lot of code kills your processor's cache.

- You also hit a lot of memory to generate all the temporary SVPs,
microtiles, merged microtiles, lists of rectangles, temporary RGB
buffers; this also kills your processor's cache.

- No profiles were ever done of the whole pipeline.

> Well, this is what I asked for: where should I look for possibilities to 
> dirty the regions efficiently?

I believe that's the wrong question at this point; see below :)

> Think slow hardware, slow framebuffer access, and no acceleration 
> whatsoever. Yes, there is DMA available.

I'd start with a simple approach:

- Write a canvas that just does everything based on bounding boxes.  Use
a plain Region-as-list-of-rectangles structure for the dirty region.

- When an item knows that it needs to be repainted, just add its old/new
bounding boxes to the dirty region.

Then, profile the hell out of it for your particular usage case.  How
many items?  What kind - plain rectangles, complex Bézier paths, etc?
Do you have a shallow hierarchy (so recursive bounding boxes are not
useful), or a deeply nested one (like a circuit editor with groups and
subgroups of items)?  Do scattered items change simultaneously, or are
the changes pretty localized?  How big are your items with respect to
the screen size?  In general, how big is an update?

When you profile that, you'll know which parts are the slow ones.

Computing the bounding boxes of Bézier paths is pretty cheap.  Computing
finer-grained regions may be expensive.  Computing the bounding boxes of
text could be cheap or expensive depending on your text layout engine -
don't reinvent the wheel!  Pango is pretty well optimized in this
respect, and it's getting better all the time.

If the slow thing is to rasterize your shapes, then see what would give
you better results: a faster rasterizer overall (Bézier to RGB
directly); or one that lets you keep an fast intermediate representation
(keep the trapezoids around instead of having to tesselate all the
time).  Keeping the intermediate representation will also cost you in
terms of memory, processor cache, etc.  Or is the problem that you don't
have hardware floating point, and the tesselator is getting killed by
that?

If the slow thing is to blit to the screen, then you want to minimize
the repaint area.  If your changes generally occupy a large area of the
screen, you are screwed and you need a faster blitter.  If your changes
are rather localized (or big in terms of bounding box, but in very
disjoint regions), you want a more efficient computation of dirty
regions --- investigate microtiles or quadtrees in that case, but don't
make your code so complex that it becomes expensive to compute those
regions.

I think everyone on this list would be very interested in your specific
usage case, and in your actual results from profiling.

  Federico