[HarfBuzz] Questions about Itemization in QT / Pango
behdad at behdad.org
Tue Jan 1 16:29:51 PST 2008
On Sun, 2007-12-30 at 14:23 -0500, Ed Trager wrote:
> Hi, Behdad, Simon, and everyone,
> I have been wondering recently a little bit about how QT and Pango
> handle itemization:
> (1) Do QT and Pango fully support itemization of all scripts now
> present in Unicode 5 ?
Yes, Pango 1.18 supports Unicode 5.0. 1.20 will support Unicode 5.1.
> In other words, while perhaps HarfBuzz does
> not yet handle OpenType layout of N'Ko or New Thai Le scripts, but
> would the itemizers in QT and Pango correctly identify segments of
> text in N'Ko and New Thai Le (and other recent Unicode script
> additions) as belonging to those respective scripts?
Pango 1.18 in fact does support N'Ko. See:
> (2) What about Plane 1 CJK? If I created a text containing BMP CJK
> with a smattering of Plane 1 CJK thrown in, how will QT and Pango
> itemize or segment that text ?
> (3) What about itemization of other Plane 1 scripts in Unicode, like
> Linear B, etc.?
Pango (and I believe Qt too) uses Unicode Character Database. So, all
the characters marked as Script Han will be grouped together.
> (4) How do QT and Pango handle IPA phonetic characters? Officially,
> one could consider IPA and other phonetic extensions in Unicode as
> belonging to "Latin" (latn) script. Some might say that is a bit of
> a stretch, because some IPA symbols might actually be closer to Greek
> in origin, but certainly Michael Everson, inter alia, will give IPA a
> "Latin" appelation. But when actually laying out text, a user might
> need or desire to use a special font (such as SIL Gentium, for
> example) for laying out segments of IPA phonetics. For example,
> suppose I am writing a dictionary and my words and definitions are in
> one font, while I might desire that my phonetic pronounciations are in
> a different font tailored for such things. Of course my word
> processor or page layout program will permit me to manually select
> which fonts to use for which parts of my document, and that is fine.
> I am just wondering if QT or Pango have any special code to handle
> such things in a more automated fashion, or on a level closer to
> fontconfig's font matching attempts?
IPA Extensions are marked as script Latin in Unicode.
> (5) A similar question for mathematical, scientific, and other
> miscellaneous symbols. Unicode now contains a number of blocks which
> make up a rather extensive set of mathematical, scientific, and
> miscellaneous technical symbols. Fonts such as the STIX font set are
> now available to specifically address the needs of scientific,
> mathematical, and other technical users. So, once again I am just
> wondering how QT and Pango handle itemization/segmentation of runs of
> text containing such symbols? Are such symbols just treated as being
> neutral? I'm just wondering if one can make an argument for defining
> a separate script category for "symbols" and then having a text
> itemizer automatically break out segments of text containing such
> symbols as separate items which can then be rendered using a font or
> set of fonts that are tailored for such things. One can imagine
> having a category for "symbol fonts" as part of the fontconfig
> pipeline, so that fontcconfig could provide automatic substitution for
> such text segments. Does that make sense?
No. You don't want the period, question mark, brackets, quotations, etc
in your Latin text be rendered using a separate font. This all really
belongs to higher level to mark text appropriately with the desired
> Since I don't know how QT and Pango currently do these things, I
> thought I would ask.
> Best Wishes for a Happy and Prosperous New Years to all! -- Ed Trager
Happy New Year to all,
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the HarfBuzz