[cairo] Meta surface proposal

Sun Feb 6 23:01:27 PST 2005

Hi,

OK, I guess I agree that if we're writing this meta surface to disk it 
could make sense to use PDF [1].  I wouldn't want to inflict another 
meta file format on the world, and as you mention, we risk that this 
format becomes a official cairo interface.  However, I'm not convinced 
that we at all need to write this to disk.  As I see it, the case to 
consider here would be how to optimize the memory usage when rendering 
from a file compared to memory usage when storing the meta-surface in 
memory.  If we can not significantly reduce the memory usage for the 
render-from-file case, there's no point in writing the page to disk. 
Keith, I don't know if you have some tricks in mind here, but the only 
optimization I can see is that when rendering from file, you can avoid 
keeping all image surfaces in memory at the same time.  Of course, to do 
this, you need know when an image surface is no longer used on the page, 
and for that you need to analyze the entire content stream before you 
start rendering. (I'm assuming that the vector graphics doesn't take up 
significant storage, and that we only care about memory usage for the 
pixmaps).

The surface modification counter is something I want to implement in any 
case to optimize the PDF output, so the meta surface backend would share 
that optimization.

So, the case we're optimizing for is when you have a lot of different 
pixmaps on the page.  The in-memory approach would need to keep them all 
in memory, where the on-disk approach conceivably could load, composite, 
and destroy one at a time.  If memory usage really is a concern, it 
would be less work to modify cairo to be able to write image surfaces to 
disk and free the memory if you reach a certain memory threshold.

cheers,
Kristian

[1] I'm not sure that it would makes sense to have a PDF parser in 
cairo, not even for a small subset.  That would make more sense as a 
standalone library, similar to libsvg-cairo.  And if you're parsing and 
rendering enough of PDF to express the cairo drawing model, you might as 
well support the remaining 10%.

Keith Packard wrote:
> Around 20 o'clock on Feb 6, Owen Taylor wrote:
> 
> 
>> - Generation and parsing of a complicated binary format
> 
> 
> We already have to generate it; parsing a known subset should be 
> a lot easier than parsing general PDF.
> 
> 
>> - Conversion of trapezoid lists into paths and back
> 
> 
> The PDF file will contain only trapezoids...
> 
> 
>> - Serialization of images
> 
> 
> We already have to do this for whatever metafile format we use -- for the 
> in-memory version, we may well have to keep many copies of source surfaces 
> in different states.
> 
> 
>>- Embedding of fonts, including bitmap fonts
> 
> 
> It's not the embedding that frightens me here -- we have to do that to 
> support PDF.  It's loading them back into FreeType which seems a terrible 
> waste.  We could investigate some mechanism for mapping the font in the 
> meta file back to a font file, but that seems obtuse.
> 
> 
>>To me it's a horrifyingly complex prospect for something that really
>>should be essentially pretty simple. There are 4, count them, 4,
>>functions in the surface vtable for drawing currently.
> 
> 
> Any metafile format will need to capture all state related to external 
> objects for each rendering operation.  This seems like the hard part of 
> the metafile generation; optimizing cases where source images are re-used 
> unchanged and the like.  I agree there is a good chance to optimize this 
> into zero-copy if we have reasonable surface modification notification and 
> store the metafile purely in memory.
> 
> 
>>Can you create test cases where the in-memory buffer for rendering a 
>>single page becomes huge? Yes, you could. But in those cases I'd argue
>>that the final produced output file will also be huge.
> 
> 
> I guess I disagree that the size of the output file is relevant here.
> Imagine a 'thumbnail' preview of a set of slides -- that captures all of 
> the graphics for an entire presentation onto a single page, something 
> which seems like a common operation and which encapsulates the rendering 
> operations for an entire presentation in a single page.
> 
> I also suggest we separate the format of the meta file data from the
> storage mechanism; there's really no reason a custom metafile format need
> be storable only in memory, nor is there any reason a PDF file need be
> stored on disk.
> 
> 
>>Plus, it's very much unclear to me how you'd do the image-vs-vector 
>>separation without having the entire page parsed in memory at once
>>anyways.
> 
> 
> I don't know why you'd need to hold the whole page in memory for this;
> you're only interested in computing the portion of the page which cannot be
> drawn with the native graphics operations; that can be done by iterative
> scans of the metafile to find a fixed point for the region connected to
> undrawable objects.  That's doable in fixed storage if you didn't mind 
> the bounds being computed as a single rectangle.  Of course, optimizing 
> this to avoid quadratic behaviour would probably be a good idea, and that 
> might take some additional storage.
> 
> The idea of simply using the PDF file grew out of a desire to avoid 
> yet-another metafile format.  One of the benefits of keeping it entirely in 
> memory is that it couldn't accidentally escape and become a public part of 
> the cairo interface.  
> 
> One alternative here is to publish this metafile format and create external
> metafile->printer conversion utilities; this would reduce the code needed
> within the cairo library itself and would ease the release engineering
> problems inherant in producing a single library with many
> mutually-incompatible pieces.