[cairo] Improving PDF output
spitzak at d2.com
Tue Jan 9 12:53:01 PST 2007
What Apple is doing is exactly what I proposed, which is to use Unicode
for the glyph id's at any time possible, and then allocate id's from the
private-use area for glyphs that are not Unicode. I still feel this is
going to make things a lot easier to use, and is probably the only way
cut & paste of presentation forms is going to work at all.
It would appear Apple is allocating the id's in order as needed, which
means that decoding the result is impossible without extra information.
I think it may be possible to make decoding more likely, by trying to
allocate the same id for a glyph on each render. One way to do this is
to hash together the most-likely unicode that contributed to the glyph
to generate the id, rehashing on any collisions with with an existing
glyph or previous allocation.
My impression of the Unicode standard is that it would be pretty safe to
allocate the codes from the 0xD800 through 0xffff range, which is the
UTF-16 surrogate pairs, the private use area, and a lot of precomposed
characters. This should be large enough that collisions are avoided so
the hashed glyph id may stay the same for quite awhile. Another
possibility is to use 0xf0000 through 0x10ffff which are private-use
planes, but then the backend must handle more than 16 bits.
> On 09/01/07, Baz <brian.ewins at gmail.com> wrote:
>> ligatures on and off, then saved it as PDF. Copy-and-paste of the text
>> only worked with ligatures off ("The fifty spiffy apples." twice came
>> out as "The fifty spiffy a The fifty spiffy apples.". The pp ligature
>> seemed to be the point of failure)
> Forgot there's fi, ff ligatures in there too. There's a unicode code
> point for fi and ff, but not for pp. If you map glyph->text by
> inverting the font's cmap table you'd get something like 'The
> \uFB01fty spi\uFB00y a', because the pp glyph id only appears in the
> mort table. That sounds like exactly what you did Alp? It suggests
> that apple aren't keeping the original text around to do the
> glyph->text mapping, though, which is what we wanted to know.
> cairo mailing list
> cairo at cairographics.org
More information about the cairo