[HarfBuzz] Questions about Itemization in QT / Pango

Behdad Esfahbod behdad at behdad.org
Tue Jan 1 16:29:51 PST 2008

On Sun, 2007-12-30 at 14:23 -0500, Ed Trager wrote:
> Hi, Behdad, Simon, and everyone,

Hello Ed,

> I have been wondering recently a little bit about how QT and Pango
> handle itemization:
>   (1) Do QT and Pango fully support itemization of all scripts now
> present in Unicode 5 ?

Yes, Pango 1.18 supports Unicode 5.0.  1.20 will support Unicode 5.1.

> In other words, while perhaps HarfBuzz does
> not yet handle OpenType layout of N'Ko or New Thai Le scripts, but
> would the itemizers in QT and Pango correctly identify segments of
> text in N'Ko and New Thai Le (and other recent Unicode script
> additions) as belonging to those respective scripts?

Pango 1.18 in fact does support N'Ko.  See:


>   (2) What about Plane 1 CJK?  If I created a text containing BMP CJK
> with a smattering of Plane 1 CJK thrown in, how will QT and Pango
> itemize or segment that text ?
>   (3) What about itemization of other Plane 1 scripts in Unicode, like
> Linear B, etc.?

Pango (and I believe Qt too) uses Unicode Character Database.  So, all
the characters marked as Script Han will be grouped together.

>   (4) How do QT and Pango handle IPA phonetic characters?  Officially,
> one could consider IPA and other phonetic extensions in Unicode as
> belonging to "Latin" (latn) script.    Some might say that is a bit of
> a stretch, because some IPA symbols might actually be closer to Greek
> in origin, but certainly Michael Everson, inter alia, will give IPA a
> "Latin" appelation.  But when actually laying out text, a user might
> need or desire to use a special font (such as SIL Gentium, for
> example) for laying out segments of IPA phonetics.  For example,
> suppose I am writing a dictionary and my words and definitions are in
> one font, while I might desire that my phonetic pronounciations are in
> a different font tailored for such things.  Of course my word
> processor or page layout program will permit me to manually select
> which fonts to use for which parts of my document, and that is fine.
> I am just wondering if QT or Pango have any special code to handle
> such things in a more automated fashion, or on a level closer to
> fontconfig's font matching attempts?

IPA Extensions are marked as script Latin in Unicode.

>   (5) A similar question for mathematical, scientific, and other
> miscellaneous symbols.  Unicode now contains a number of blocks which
> make up a rather extensive set of mathematical, scientific, and
> miscellaneous technical symbols.  Fonts such as the STIX font set are
> now available to specifically address the needs of scientific,
> mathematical, and other technical users.  So, once again I am just
> wondering how QT and Pango handle itemization/segmentation of runs of
> text containing such symbols?  Are such symbols just treated as being
> neutral?  I'm just wondering if one can make an argument for defining
> a separate script category for "symbols" and then having a text
> itemizer automatically break out segments of text containing such
> symbols as separate items which can then be rendered using a font or
> set of fonts that are tailored for such things.  One can imagine
> having a category for "symbol fonts" as part of the fontconfig
> pipeline, so that fontcconfig could provide automatic substitution for
> such text segments.  Does that make sense?

No.  You don't want the period, question mark, brackets, quotations, etc
in your Latin text be rendered using a separate font.  This all really
belongs to higher level to mark text appropriately with the desired

> Since I don't know how QT and Pango currently do these things, I
> thought I would ask.
> Best Wishes for a Happy and Prosperous New Years to all! -- Ed Trager

Happy New Year to all,


"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759

More information about the HarfBuzz mailing list