[Fribidi-discuss] Re: BiDi WINE status and fribidi
Behdad Esfahbod
behdad at bamdad.org
Mon Aug 26 10:30:04 EST 2002
On Sun, 25 Aug 2002, Shachar Shemesh wrote:
> Behdad Esfahbod wrote:
> >You are simplifying things too much. The two easy steps are what
> >you told. But there are some harder ones two:
> >
> > * Rule W5: States that if there is just *one* char of
> >type .... Then you should be aware that a surrogate pair is
> >just one character, not two.
> >
> > * Rule L3: NSMs in RTL levels should be reordered to
> >come after their base, now the problem is that both the NSM can
> >be a surrogate pair, and the base can be a surrogate.
> >Headache...
> >
> >
> So that's why we pay you - to know those things (what do you mean you
> havn't gotten the cheque. I mailed it myself yesturday!)
>
> >So please please don't talk about UTF-8, thats already enough.
> >
> The voice of reason. Ok, you are, of course, right.
>
> >Yours,
>
> Ok, let's see.
> Since we have accepted my proposal of marking both chars of the
> surrogate with the codepoint's type, only rules that apply to a single
> letter need any special processing at all.
Yes.
> Let's review them, then:
> Rule W4 - European seperator between european numbers. Only the
> seperatore is affected.
> (Rule W5 discusses a sequence of characters of the same type. Are you
> sure it's relevant?).
No, ofcourse I meant W4 (What do you mean you have not get the
errata? I sent it myself just after my post).
> I have seen no more rules that seem to apply (rule L3 doesn't seem
> related to the rule Behdad quoted, and the rule Behdad quoted seem, it
> appears, to be covered by the second assumption I originally took. I
> suspect I am misunderstanding here).
Rule L3 IS affected, as the NSMs should move to the other side of
their base character, and when the base character is a surrogate,
we should take care of it.
> If we add to that the fact that ALL surrogated characters (i.e. - all
> characters whose code point is higher than 0xFFFF) are L (table at
> http://www.unicode.org/unicode/reports/tr9/#Bidirectional_Character_Types),
No, you are simply wrong, there are already EN, BN characters
there, and should other types get encoded. The page you are
refering states that all un-assigned chars in this region are L.
> I don't think my original suggestion of a change needs amendment
> (barring the warning at the bottom: "Unassigned characters are given
> strong types in the algorithm. This is an explicit exception to the
> general Unicode conformance requirements with respect to unassigned
> characters. As characters become assigned in the future, these
> bidirectional types may change.").
I can't really understand what you mean. The unassigned char's
types may change, well they get assigned in future.
> Behdad, I think I'm missing something here. I was using version 10 of
> the 3.2 standard
> (http://www.unicode.org/unicode/reports/tr9/tr9-10.html). The rule
> numbers seem a bit wrong, and the quotes you give do not appear at all.
Ok, see above. I prefer the latest version always available at:
http://www.unicode.org/unicode/reports/tr9
which is the same as your accidenatlly ;).
> Shachar
--
Behdad Esfahbod 4 Shahrivar 1381, 2002 Aug 26
http://behdad.org/ [Finger for Geek Code]
#define is_persian_leap(y) ((((y)-474)%2820+2820)%2820*31%128<31)
More information about the FriBidi
mailing list