[cairo] non-ascii output

Tue May 17 09:43:13 PDT 2005

It does seem the Cairo authors are seriously underestimating how 
important the "toy" (as they call it) utf-8 interface is. It is vital 
that we be able to draw arbitrary strings as UTF-8 and see something 
that approaches what you can get *today* on OS/X and Windows. If this is 
missing it will be impossible for serious software to use Cairo. Pango 
is *not* an answer, without a "toy" style interface the migration path 
to using Cairo is too difficult. Even with Pango, there is still a need 
to draw text as a string of glyphs, in order to duplicate the output of 
other programs, or to unambigously display and edit an email address or 
a string constant in a program's source code.

DRAW ALL GLYPHS:

It MUST draw as many appropriate glyphs as it possibly can. If the 
"current font" does not have a glyph then it will select a glyph from a 
set of backup fonts, including a 16x16 bitmapped one of every assigned 
Unicode index. If there is no glyph it should draw the hex number in a box.

PANGO:

There is also a UCS-4 interface that takes x/y positions for each glyph, 
this is what Pango uses. It should draw the exact same set of glyphs. To 
avoid code duplication with Pango, I feel that the entire font-selection 
mechanism from Pango must be moved into Cairo.

To allow "alternate forms" use the high byte (which must be zero for 
normal Unicode) to index alternate forms of glyphs, allowing 256 
varieties (if not enough, peraphs all codes greater than 0x10ffff can be 
used, allowing 0x7fff possible alternate forms). This I think will allow 
Pango to use Cairo's backend without any need to switch fonts. It does 
mean that Pango and Cairo have to agree on the indexes for all possible 
alternate forms.

Though I am not completely sure what Pango does, it seems possible we 
could define it that if you sent Pango a series of 1-character strings 
to render the output would be the same as the "toy" interface.

UTF-8 ERRORS:

To allow and encourage usage of UTF-8, errors in UTF-8 must draw a glyph 
for each byte in the error code *and continue parsing the string*. If 
errors cause the string or the error bytes to disappear or throw an 
exception, it will kill any incentive to use UTF-8, since the program 
will have to search the strings beforehand for errors, which is just as 
much work as converting the encoding they are already using.

Similarily I recommend that all errors be drawn as though the error 
bytes are in ISO-8859-1 or even the Microsoft CP1252 character sets. 
This will allow virtually all 8-bit text to be drawn unchanged and thus 
remove the need to preserve an ASCII data path and duplicate interfaces 
through the software. Without this programs are forced to convert ASCII 
data to UTF-8 and again they lose the incentive to convert to UTF-8 
only. I know this runs into extreme resistance because it is considered 
US/Euro-centric and thus politically incorrect, but I still want to try 
to fight for this.

If Pango has a UTF-8 interface, it must translate it exactly the same as 
Cairo.

FONT NAMES:

It appears it will be possible to use backend-specific code to select 
the Cairo font. This is fine, but there must be a way to select the 
exact same font using a string name.

The reason is that users want to see the same font in multiple programs. 
Not all programs are going to be written to be backend-specific. However 
all of them will be capable of reading the user's preference as a string 
and sending that to Cairo.

I recommend the string be something like "fontname <garbage>" where 
<garbage> is back-end-specific code. If fed to a different backend it 
will ignore <garbage> and use the fontname only to select the font, with 
any luck the font will be similar.