[cairo] non-ascii output
Bill Spitzak
spitzak at d2.com
Tue May 17 09:43:13 PDT 2005
It does seem the Cairo authors are seriously underestimating how
important the "toy" (as they call it) utf-8 interface is. It is vital
that we be able to draw arbitrary strings as UTF-8 and see something
that approaches what you can get *today* on OS/X and Windows. If this is
missing it will be impossible for serious software to use Cairo. Pango
is *not* an answer, without a "toy" style interface the migration path
to using Cairo is too difficult. Even with Pango, there is still a need
to draw text as a string of glyphs, in order to duplicate the output of
other programs, or to unambigously display and edit an email address or
a string constant in a program's source code.
DRAW ALL GLYPHS:
It MUST draw as many appropriate glyphs as it possibly can. If the
"current font" does not have a glyph then it will select a glyph from a
set of backup fonts, including a 16x16 bitmapped one of every assigned
Unicode index. If there is no glyph it should draw the hex number in a box.
PANGO:
There is also a UCS-4 interface that takes x/y positions for each glyph,
this is what Pango uses. It should draw the exact same set of glyphs. To
avoid code duplication with Pango, I feel that the entire font-selection
mechanism from Pango must be moved into Cairo.
To allow "alternate forms" use the high byte (which must be zero for
normal Unicode) to index alternate forms of glyphs, allowing 256
varieties (if not enough, peraphs all codes greater than 0x10ffff can be
used, allowing 0x7fff possible alternate forms). This I think will allow
Pango to use Cairo's backend without any need to switch fonts. It does
mean that Pango and Cairo have to agree on the indexes for all possible
alternate forms.
Though I am not completely sure what Pango does, it seems possible we
could define it that if you sent Pango a series of 1-character strings
to render the output would be the same as the "toy" interface.
UTF-8 ERRORS:
To allow and encourage usage of UTF-8, errors in UTF-8 must draw a glyph
for each byte in the error code *and continue parsing the string*. If
errors cause the string or the error bytes to disappear or throw an
exception, it will kill any incentive to use UTF-8, since the program
will have to search the strings beforehand for errors, which is just as
much work as converting the encoding they are already using.
Similarily I recommend that all errors be drawn as though the error
bytes are in ISO-8859-1 or even the Microsoft CP1252 character sets.
This will allow virtually all 8-bit text to be drawn unchanged and thus
remove the need to preserve an ASCII data path and duplicate interfaces
through the software. Without this programs are forced to convert ASCII
data to UTF-8 and again they lose the incentive to convert to UTF-8
only. I know this runs into extreme resistance because it is considered
US/Euro-centric and thus politically incorrect, but I still want to try
to fight for this.
If Pango has a UTF-8 interface, it must translate it exactly the same as
Cairo.
FONT NAMES:
It appears it will be possible to use backend-specific code to select
the Cairo font. This is fine, but there must be a way to select the
exact same font using a string name.
The reason is that users want to see the same font in multiple programs.
Not all programs are going to be written to be backend-specific. However
all of them will be capable of reading the user's preference as a string
and sending that to Cairo.
I recommend the string be something like "fontname <garbage>" where
<garbage> is back-end-specific code. If fed to a different backend it
will ignore <garbage> and use the fontname only to select the font, with
any luck the font will be similar.
More information about the cairo
mailing list