[cairo] Cairo status update (new PostScript stuff and more)

Thu Apr 6 15:36:59 PDT 2006

The cairo list has been too quiet lately. I'm changing that now with
this (longish) message, and likely several follow ups to sub-pieces of
it.

Some of that has been due to people just being heads-down and
coding[1], and some of it has been due to people being off doing
interesting things. For instance, I attended LGM[1] recently which was
very productive. I had some very encouraging discussions with folks
from Xara[2], Inkscape, and Scribus[3]. All three of those projects
are quite interested in cairo's PDF export capability, (or interested
in seeing that capability appear soon).

Similarly, we've already seen users quite disappointed with the PS/PDF
output from currently released versions. Such as the question that I'm
replying to with this mail:

> On 3/31/06, rush ta <anjutagtk at gmail.com> wrote:
> > hi martin...
> >
> > I wanna use cairo to generate PS presentaions..
> > When i use linear gradient to fill rectangles..
> > the file size dramatically increases to about 60 MB for A4 size document..
> > I have used version 1.0.2 ... has anything changed in 1.0.4
> > also when is the probable date for making PS support complete..

The short answer is that the PostScript backend was quite broken in
1.0.2, (and didn't change much in 1.0.4). It sometimes gives the wrong
answer, and (I hadn't realized this until recently) the code it had
for compressing images included as fallbacks was not properly enabled.

The answer for when PostScript/PDF support will be more complete is
when the 1.2 release of cairo happens. I'm hesitant to announce dates
because I've already missed every date I've suggested before. :(

But, there has been some tremendous progress on this front recently,
and I really hope to have the 1.2 release out the door by the end of
April.

Here is some of the progress that has been happening:

 * Alexander Larsson provided code to compress images properly once
   again. His fixes also added ASCII encoding of the images rather
   than binary, which provides much better portability. It also uses
   LZW rather than Flate compression which means that cairo now
   requires only a LanguageLevel 2 rather than a LanguageLevel 3
   implementation of PostScript. This also improves portability of the
   resulting files.

 * Keith Packard wrote a new "analysis surface", (which was on the
   ps-surface branch in git). I've now improved this quite a bit and
   merged it into mainline cairo.

I'd like to summarize the current status now that the analysis surface
is in place. The short story is that compared to the 1.0 series, the
current code should have the following improvements:

 + Better output fidelity

   Now, the output should always be correct, (in 1.0 there were
   various things one could attempt to draw for which cairo would just
   give a garbage result).

 + Smaller files when fallbacks are involved

   When cairo does resort to image fallbacks, the resulting files
   should now be much smaller, (since the image compression was
   missing/broken in cairo 1.0).

There are still a couple of potential regressions compared to the
quality of the PostScript output in cairo 1.0:

 - We might be using fallbacks more than strictly necessary

 - Font support isn't there just now, (text is being output as paths)

And these are the things I want to improve between now and the actual
1.2 release. And I'd like some help, too!

One way to help is for people to test things. The kinds of things I'd
like as feedback from testers are, in order of severity:

 1. Is the output ever "wrong", (graphics don't render properly, file
    fails to be read by an interesting interpreter, etc.)

 2. Is the output using image fallbacks in a case where you think it
    shouldn't, (that is, your drawing doesn't have any fancy
    translucence or something else that PostScript can't support
    natively). In particular, I want to know if cairo is now using
    fallbacks for something that was handled natively by the cairo 1.0
    series.

There are two things I will be doing as soon as possible to make it
easier for people to test things. First, I'll be setting up a git2cvs
bridge. This will allow people to use things like jhbuild (which talks
CVS but doesn't talk git) to be able to build/test cairo. Second, I'll
be posting regular snapshots throughout the remainder of the 1.2
development cycle.

Finally, here's a little more detail on where the code stands, along
with some hints for people that want to help out with the remaining
code that needs to be done.

Sometime after the 1.0 release, we changed the PostScript and PDF
backends to always use full-page image fallbacks for everything. This
allowed us to incorporate these backends into the test suite and
verify that the meta-surface was at least recording all operations
successfully, (which actually took quite a bit of extra effort to
complete).

Now, we've added the analysis surface as well. So there is now a small
family of internal surfaces involved, (paginated, analysis, and
meta). Here's how these all work. When the user asks for a ps or pdf
surface, what they actually get back is a paginated surface that wraps
the "real" ps or pdf surface as its target. Then, the paginated
surface directs all drawing operations to a meta-surface where each
operation is recorded. Finally, when each page is complete,
(_cairo_paginated_surface_show_page or _copy_page), the paginated
surface replays the meta-surface through a couple of passes.

The first pass is a replay of the meta-surface against a new analysis
surface which again has the "real" surface as its target. This replay
is special in that the real surface is first told that it is being put
into "analysis mode" (via set_paginated_mode and CAIRO_PAGINATED_MODE_ANALYZE). 
This pass is the first time the real surface will see any drawing
operations as recorded by the meta-surface. But in analysis mode the
real surface does not "draw" anything but simply reports whether each
operation would be supported or not. It is the job of the analysis
surface to track which operations are supported and then construct a
strategy for dealing with all supported/unsupported operations.

The current strategy in the analysis surface is quite simplistic. It
simply provides a Boolean value to the paginated surface indicating
whether any unsupported operation occurred on the page. If not, the
paginated surface does its second pass by replaying the meta-surface
to the "real" backend, (resulting in all-native output for that
page). But if there is any unsupported operation, the paginated
surface does its second pass by replaying the meta-surface to an image
surface and painting the result into the real backend, (resulting in
all-fallback for that page).

Even though this strategy is very simplistic it has some good results
already. For example, in the current test suite we have 61 test
cases. Before this analysis stuff landed, all 61 tests resulted in
full-page image fallbacks. Now, 32 of the test cases are handled as
all-native by the PostScript backend, (at least with the default
content type of color and alpha).

The hope is that even with such a simplistic strategy there might be a
large number of interesting documents, (say, web pages or similar with
text and opaque images), that would still result in all-native output.

But now that we've got all this infrastructure in place, it should
also be very easy to drop in a more sophisticated strategy. For
example, instead of just storing a Boolean value, the analysis surface
can collect up two separate regions for the parts of the page that are
either supported and unsupported by the target backend. That would
allow fallback images to be restricted to more minimal areas of the
output rather than whole pages at a time, so the output files would be
much smaller.

So, improving that strategy is one thing I'm interested in doing,
(either before or after 1.2 depending on feedback from testing of the
per-page fallback strategy).

Other things that are even easier to do, (and I hope people will jump
in to help out with) is to turn on native support for individual
operations and mark them as supported. Some of the things that can be
done here include:

 * PDF output

   You might notice I've been talking about PostScript throughout this
   mail. That's simply because it was what I decided to start with,
   (being "harder" than PDF). But all the paginated/analysis/meta
   infrastructure is entirely common, (along with any new work on more
   interesting analysis strategies). So all that's missing is a little
   bit of cut-and-paste from cairo-ps-surface.c to cairo-pdf-surface.c
   to do similar kinds of analysis.

 * RGB24 test cases

   Every test in the suite is currently run against both an "ARGB32"
   and an "RGB24" target surface for each backend. These (misnamed)
   variations indicate whether the target surface is treated as
   supporting destination alpha or not. They might better be named
   CONTENT_COLOR_ALPHA and CONTENT_COLOR cases

   If you look, you might notice that all RGB24 PostScript tests are
   currently resulting in all-fallbacks. This is ironic since the
   non-destination-alpha semantics are actually much easier to
   support. They are a more natural fit with PostScript's imaging
   model. For instance, many operations, (such as painting with SOURCE
   or CLEAR operators), that can't always be handled for PostScript
   under COLOR_ALPHA semantics can always be handled for PostScript
   under COLOR semantics.

   So what's needed here is first to turn
   _cairo_paginated_surface_create_similar, (it exists already but is
   not currently being compiled), then fix what that
   breaks. Afterwards some small tweaks to mark the new operations as
   supported should be all that's necessary.

 * Text support

   Again, this is a case where there is a bunch of existing code that
   works, but is currently turned off. Before, the PostScript backend
   would do font-subsetting in some cases, and fall over
   otherwise. Now, we have text-as-paths in all cases.

   So all that should be needed is to properly characterize when the
   existing (and currently disabled) font code is guaranteed to work,
   and then just call it. 

The common theme in a lot of that stuff is just turning on existing
code and then doing small fixes to make sure it works. And the test
suite should provide a lot of guidance through all of this.

And of course, I'm also available to provide guidance to anyone who
would like to help with any of this. Come join the fun and let's make
cairo generate high-quality output for printing together!

Most of all, let's all have fun with cairo.

-Carl

[1] For example, Emmanuel Pacaud recently let me know that he hopes to
have the SVG backend ready for inclusion with the upcoming 1.2
release. Similarly, Kristian Høgsberg just started a new "user font"
capability which he hopes to have ready for 1.2 as well. I hope he'll
let us hear more about it soon, but in the meantime, you can peak at
it here:

http://gitweb.freedesktop.org/?p=users-krh-cairo;a=shortlog;h=user-font

[2] http://libregraphicsmeeting.org/

[3] If you haven't seen the Xara GPL source code release yet, see:

	http://www.xaraxtreme.org/

    I had some rather in-depth conversations with the Xara people
    about potential mutual collaboration of Xara and cairo source
    code, which I would like to talk about more later.

[4] Scribus in particular has some rather demanding needs in the way
of PDF export. These needs won't be easy to meet right away---even if
we get everything I currently want into cairo 1.2, it still won't be
close. But Scribus should be able to provide us a very interesting
target for future PDF development.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060406/eceef1f6/attachment.pgp