[HarfBuzz] Questions about Itemization in QT / Pango

Ed Trager ed.trager at gmail.com
Sun Dec 30 11:23:51 PST 2007


Hi, Behdad, Simon, and everyone,

I have been wondering recently a little bit about how QT and Pango
handle itemization:

  (1) Do QT and Pango fully support itemization of all scripts now
present in Unicode 5 ?  In other words, while perhaps HarfBuzz does
not yet handle OpenType layout of N'Ko or New Thai Le scripts, but
would the itemizers in QT and Pango correctly identify segments of
text in N'Ko and New Thai Le (and other recent Unicode script
additions) as belonging to those respective scripts?

  (2) What about Plane 1 CJK?  If I created a text containing BMP CJK
with a smattering of Plane 1 CJK thrown in, how will QT and Pango
itemize or segment that text ?

  (3) What about itemization of other Plane 1 scripts in Unicode, like
Linear B, etc.?

  (4) How do QT and Pango handle IPA phonetic characters?  Officially,
one could consider IPA and other phonetic extensions in Unicode as
belonging to "Latin" (latn) script.    Some might say that is a bit of
a stretch, because some IPA symbols might actually be closer to Greek
in origin, but certainly Michael Everson, inter alia, will give IPA a
"Latin" appelation.  But when actually laying out text, a user might
need or desire to use a special font (such as SIL Gentium, for
example) for laying out segments of IPA phonetics.  For example,
suppose I am writing a dictionary and my words and definitions are in
one font, while I might desire that my phonetic pronounciations are in
a different font tailored for such things.  Of course my word
processor or page layout program will permit me to manually select
which fonts to use for which parts of my document, and that is fine.
I am just wondering if QT or Pango have any special code to handle
such things in a more automated fashion, or on a level closer to
fontconfig's font matching attempts?

  (5) A similar question for mathematical, scientific, and other
miscellaneous symbols.  Unicode now contains a number of blocks which
make up a rather extensive set of mathematical, scientific, and
miscellaneous technical symbols.  Fonts such as the STIX font set are
now available to specifically address the needs of scientific,
mathematical, and other technical users.  So, once again I am just
wondering how QT and Pango handle itemization/segmentation of runs of
text containing such symbols?  Are such symbols just treated as being
neutral?  I'm just wondering if one can make an argument for defining
a separate script category for "symbols" and then having a text
itemizer automatically break out segments of text containing such
symbols as separate items which can then be rendered using a font or
set of fonts that are tailored for such things.  One can imagine
having a category for "symbol fonts" as part of the fontconfig
pipeline, so that fontcconfig could provide automatic substitution for
such text segments.  Does that make sense?

Since I don't know how QT and Pango currently do these things, I
thought I would ask.

Best Wishes for a Happy and Prosperous New Years to all! -- Ed Trager



More information about the HarfBuzz mailing list