[Fontconfig] Improving Latin font selection for CJK locales

Behdad Esfahbod behdad at behdad.org
Tue Jan 29 18:06:39 PST 2008


Hi Ed,

If you are interested in the details, read the entire thread here:

  http://mail.gnome.org/archives/gtk-i18n-list/2007-December/thread.html

I'm trying to avoid repeating the same reasoning again and again, and
it's really not quite on topic on fontconfig list anyway.

behdad


On Tue, 2008-01-29 at 20:56 -0500, Ed Trager wrote:
> Hi, Qianqian,
> 
> Latin digits are basically treated as "neutral" characters in a run of
> text -- I think that is pretty much
> "standard Unicode operating procedure" if you look at how the digits
> are categorized in UCD.
> 
> I don't know the internal details of how Pango itemizes a string of
> text, but using
> your "pngsBGtUJxMgD.png" as an example, we can see what is most likely
> occurring: First, it appears that Pango treats  "1234A" as a run of "latn" text
> because of the presence of the letter "A" -- all characters
> preceding the "A" are "neutrals" which presumably don't influence the
> itemizer, but of course
> the letter "A" tells the itemizer that the current run of text is Latin script.
> Then of course the "我" starts a new run of text which gets classified as Han
> ("hani" if using the ISO 15924 code) script -- and the following
> neutrals "123" remain a part of that
> 2nd text segment. The final "ABC" however causes the itemizer to break
> out a 3rd segment --and it is "latn".
> 
> Pango presumably then talks to fontconfig to get the font assignments
> for each of the three segments.
> Behdad can confirm if this is in fact how the itemizer works or not.
> 
> So fixing this kind of "bug" or "feature" may require changing how the
> itemizer works.
> For example, what if digits were not categorized as "neutrals" but
> were instead assigned their own
> category of "Latin Digits" ?
> 
> Then a text itemizer could break out "latin digits" into separate segments.
> 
> For a document with Latin script, maybe these "latin digit" segments
> eventually get merged back into
> the "latn" segments because it is not necessary to treat them any
> differently from how the "latn" segments
> are treated.
> 
> But if the main script is not Latin, then there may be some advantage
> to treating "latin digits" segments separately.
> 
> For example, it would allow your Chinese text to have latin digits
> rendered in DejaVu Sans because the "latin digits" segments could
> simply be treated as another special kind of "latn" segment.
> 
> There might also be some benefit to doing this in Arabic texts since
> the "latin digits" and even the "Arabic digits" need to be rendered as
> runs of LTR text embedded in surrounding RTL text.
> 
> Of course there may be other issues and cases which I have not thought
> of yet, but this is not the first time that I have thought about
> treating segments of "latin digits" as some non-neutral category for
> the purposes of enhanced itemization.
> 
> (I am actually currently working on writing some C++ UnicodeText
> classes of my own -- and just recently was playing around with these
> issues of text itemization, so I am very interested to learn what
> people *really* want to have).  Is it possible that what people really
> want may *differ* in some details from the status-quo standard Unicode
> practices?
> 
> Best Wishes - Ed
> 
> >
> > the second point currently is not possible, because Pango labels the Common
> > scripts (digits) near Chinese text as Chinese, and in fontconfig, we never
> > know if it is a common-script or Chinese Hanzi. This caused porblems
> > like this:
> >
> > https://www.redhat.com/archives/fedora-fonts-list/2007-December/pngsBGtUJxMgD.png
> >
> > Seems to me that the proposed methods will still assign lang=zh for Common
> > scripts between Chinese Hanzi if locale=zh. So, it may still not likely
> > that we can force to use smooth Latin fonts for Common via fontconfig,
> > is my understanding correct?
> >
> >
> > >
> > >> --Pat
> >
> > >>
> >
> > _______________________________________________
> > Fontconfig mailing list
> > Fontconfig at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/fontconfig
> >
-- 
behdad
http://behdad.org/

"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759



More information about the Fontconfig mailing list