[Fribidi-discuss] Record Separator in BiDi

Ilya Konstantinov future at shiny.co.il
Tue Aug 13 08:37:04 EST 2002


Hi,

(Please CC your responses to me, since I'm not subscribed to 
fribidi-discuss. Thanks.)

While working with Windows NT, I've noticed two interesting Unicode
characters in their Insert Control Characters popup -- RS (Record
Separator) and US (Unit Separator). On NT, they seemed to break the
sequence and revert the directionality to paragraph's one. That is,
"one<RS>two" in RTL would be rendered as "twoone" instead of "onetwo"
(RS itself seems to be zero-width).

According to the Unicode code charts, RS is 0x1E and US is 0x1F -- both
displayed as "Missing-Character" squares in Mozilla (I guess it simply
passes them to DrawString, instead of assuming them as zero-width
control characters).

Do we handle those characters in any way in FriBidi?
According to Simon Montagu (the Mozilla BiDi guy), those characters are 
defined by the standard: "0x1E is defined as B (Paragraph separator) and 
of 0x1f as S (Segment separator), and the behaviour of
these types is described in Unicode TR9". Also, according to Simon and 
my tests, Mozilla currently doesn't handle this character. Nor does 
Trolltech Qt 3.x.

All in all, those are very important and convenient characters to
prohibit unwanted BiDi behavior (e.g. on webpages, when two unrelated
fields get mixed together because of BiDi processing).





More information about the FriBidi mailing list