[Fribidi-discuss] Re: [bidi] fribidi and arabic joining

Doug Felt dougfelt at us.ibm.com
Mon Mar 25 22:29:01 EST 2002


I've always used the first technique, and this is what Java uses for
Arabic.  We run bidi, shape the text (using font tables if available),
record the glyph-to-char mapping, and use this information to assign
advances to each character.  We then use these advances and line break
rules to break the text logically (to keep logically contiguous text on the
same line).  We then apply the bidi line rules to adjust the levels of
trailing counterdirectional white space, and order the final runs on each
line.  We don't shape across level boundaries, in fact we've always assumed
that level boundaries mark glyph processing boundaries.  While it's
possible of course to create additional boundaries that are not needed, I'm
not aware of situations where anyone needs to force a level change and
still require glyph processing across the level.

I'd be interested to hear of cases where shaping/mark placement/ligature
formation was affected by the line break portion of bidi.  Offhand I can't
imagine any.

Other than having a 'tail' glyph for monospaced arabic, is there anything
else special about a monospaced arabic font?  Perhaps ligature forms across
multiple character cells (more precisely, more than one and fewer than the
number of characters composing the ligature ?)

( BTW, the thought of yet another character-based rendering engine designed
around custom fonts makes me queasy, but its not my business... )

Doug



                                                                                                                                       
                      Behdad Esfahbod                                                                                                  
                      <behdad at bamdad.or        To:       Fribidi Discussion List <fribidi-discuss at lists.sourceforge.net>               
                      g>                       cc:       bidi at unicode.org                                                              
                      Sent by:                 Subject:  [bidi] fribidi and arabic joining                                             
                      bidi-bounce at unico                                                                                                
                      de.org                                                                                                           
                                                                                                                                       
                                                                                                                                       
                      03/25/2002 07:01                                                                                                 
                      PM                                                                                                               
                                                                                                                                       
                                                                                                                                       




[I've put the bidi at unicode.org in CC, please remove it when
replying if its a fribidi specific reply.]

Hi all,

Sorry for being silent, and thanks for raising the discussion.
Both Roozbeh and I are on vacations, so none of us replied
before.

A. First, I should show what I meant by arabic joining, and why
we need it.  You can skip this part if you are not interested in
arabic specific matters.  As you know fribidi is a lightweight
portable library implementing the Unicode BiDi Algorithm.  The
command line tool is much more useful to the Hebrew community
than the Arab (and Iranian) ones, the reason is what Unicode
Standard calls the Arabic Joining Algorithm.  The Arabic Joining
algorithm, determines that which one of glyphs of each arabic
letter should be used, depending on the surronding letters.  The
behaviour of this algorithm is well-defined in the Unicode
Standard, but it's interaction with BiDi Alg. is not so defined.
The following discussion proofs that the Arabic Joining Algorithm
cannot run after the BiDi Algorithm, at least when using
non-fixed-width fonts:

             * The BiDi algorithm has two parts:

                         1. Determining the embedding levels.
                         2. Reordering.

             * The UAX#9 says that the first part should be applied on
each paragraph, but second part should be applied on each line of
text; and asks the higher protocol to break the lines between
these two parts (I do not have the spec here, but look for it
before the L1 rule).

             * The line breaking algorithm then should be applied in
the logical text (not visual), means before reordering; but to
break the lines you need to know the final glyphs, as the various
glyphs of an arabic letter differ in width a lot.

             * To determine the final glyph for an arabic letter, you
need the Arabic Joining Algorithm.

             * Then the Arabic Joining Algorithm should be applied
before the Reordering part of the BiDi Algorithm.

             * The other reason that the Arabic Joining Algorithm
cannot be applied after the BiDi Alg. is that the behaviour of
the BiDi algorithm is not well-defined on BN characters like Zero
Width Joiner U+200D and Zero Width NonJoiner U+200C which have an
important role in the Arabic Joining Algorithm, so two different
implementations may lead to different final glyphs.

You may think that the Arabic Joining Alg. can be applied before
the BiDi algorithm.  But things are not so easy, the Arabic
Joining Alg. itself needs the "Left" and "Right" character of a
character in text, which Left and Right are defined in the visual
text, not logical, the left and right characters cannot be found
easily from the next and previous character of the logical order,
because of the override marks (LRO and RLO).  Then to run the
Arabic Joining Alg. you need the visual ordering, which can be
determined by the BiDi Alg.!!  Roozbeh and I are working on
preparing a proposal for the UTC discussing the interaction of
the two algorithms (infact the first idea of adding joining to
fribidi was from here, testing my own ideas).  But for now, two
methods can be used to solve this circular dependency:

             * Arabic Joining and Line Breaking independency:  If we
can prove that the Arabic Joining Algorithm is independent from
the Link Breaking Algorithm, then do this:

                         1. Reorder the text without line breakation,
which gives us the visual order.

                         2. Run the Arabic Joining Algorithm to find the
final glyph of each character.

                         3. With final glyphs in hand, we can break the
lines and reorder the text correctly.

             * The other idea is that to extract the meaning of Left
and Right character somehow before reordering the text, and just
from the embedding levels found in the first part of BiDi
Algorithm, or some other data which can be extracted from the
bidi marks.  After playing with the counter-examples for various
cases, I found this algorithm (read: no counter-examples found
yet), it uses the fact that all the letters that are subject to
change under the Arabic Joining Alg. are right to left letters
under BiDi Alg. :

                         1.  Reverse the text between each LRO and its
corresponding PDF.  Then reverse the text in each explicit
embedding or override in this text again.  Call this ordering of
text the RtLFriendly order.

                         Example:
                                     <LRO> a b C D <RLE> f g H <PDF> x Y z
<PDF>
                         =>          <LRO> z Y x <RLE> f g H <PDF> D C b a
<PDF>
                         also        <LRO> a b <RLO> f g <PDF> h <LRE> x y
<PDF> Z <PDF>
                         =>          <LRO> Z <LRE> x y <PDF> h <RLO> f g
<PDF> b a <PDF>

                         2.  Now apply the Arabic Joining Alg. on the
RtLFriendly order with next as Left and previous as Right
character, and find the final glyphs.

                         3. With final glyphs, find the embedding levels,
break the lines and reorder the text.

The first idea needs some work to prove the independency (which
may not be true).  But the second one which is a bit complex
seems to produce the desired result.  I will provide the test
cases for different cases in another mail.

[End of BiDi vs. Arabic Joining interaction material, the rest is
fribidi related.]

B. Our implementation of the Arabic Joining Algorithm is quite
small and light, that will not harm the objectives of fribidi at
all, but makes it much more useful, either the command line tool
(that can be used to cat right to left files), and the library.
Many applications that use fribidi do not support Arabic Joining
as there is no light-weight implementation of it availble, or the
author just wanted it to work for hebrew.  But with Arabic
Joining in fribidi the developer can just easily turn the arabic
joining on to work well for arabic too.

C. The Pango is not a real solution for the audience of fribidi:
fribidi has been ported to some mobile devices.  Also fribidi has
been used on linux console and xterm, that is not a good idea to
use pango for arabic joining there.  fribidi is mostly used for
hebrew and arabic scripts, which their rendering will be
completed with arabic joining algorithm, then we should not worry
about other shaping matters, when shaping of all the Unicode
characters is needed, the fribidi feature can be turned off.

D.  Using the Unicode Arabic Presentation Forms is also essential
with Linux console, as the kernel maps the Unicode codepoints to
glyphs, for other scripts like syriac which does not have the
presentation forms in unicode, their presentaion forms should be
registered in the private area of unicode (H. Peter Anvin is
responsible for registering them in linux), to  be able to show
them in linux console.

E.  About the overhead of it on fribidi, I believe that the
hebrew community should not be so happy, but:

             1.  It can be fully turned out with a configure time
option.
             2.  When compiled with Arabic Joining, by default its
off, the developer should turn it on if needed.
             3.  I try to put it in a different binary to save the
resources.


I hope that with the above discussion there will be enough
reasons for all of you to put it in fribidi.

Yours,
-- Behdad Esfahbod                                      6 Farvardin 1381,
2002 Mar 26
<behdad at bamdad dot org>                       [Finger for Geek Code]











More information about the FriBidi mailing list