[Fribidi-discuss] Re: BiDi WINE status and fribidi

Mon Aug 26 10:30:04 EST 2002

On Sun, 25 Aug 2002, Shachar Shemesh wrote:
> Behdad Esfahbod wrote:
> >You are simplifying things too much.  The two easy steps are what 
> >you told.  But there are some harder ones two:
> >
> >	* Rule W5:  States that if there is just *one* char of 
> >type ....   Then you should be aware that a surrogate pair is 
> >just one character, not two.
> >
> >	* Rule L3:  NSMs in RTL levels should be reordered to 
> >come after their base, now the problem is that both the NSM can 
> >be a surrogate pair, and the base can be a surrogate.  
> >Headache...
> >  
> >
> So that's why we pay you - to know those things (what do you mean you 
> havn't gotten the cheque. I mailed it myself yesturday!)
> 
> >So please please don't talk about UTF-8, thats already enough.
> >
> The voice of reason. Ok, you are, of course, right.
> 
> >Yours,
> 
> Ok, let's see.
> Since we have accepted my proposal of marking both chars of the 
> surrogate with the codepoint's type, only rules that apply to a single 
> letter need any special processing at all.

Yes.

> Let's review them, then:
> Rule W4 - European seperator between european numbers. Only the 
> seperatore is affected.
> (Rule W5 discusses a sequence of characters of the same type. Are you 
> sure it's relevant?).

No, ofcourse I meant W4 (What do you mean you have not get the 
errata?  I sent it myself just after my post).

> I have seen no more rules that seem to apply (rule L3 doesn't seem 
> related to the rule Behdad quoted, and the rule Behdad quoted seem, it 
> appears, to be covered by the second assumption I originally took. I 
> suspect I am misunderstanding here).

Rule L3 IS affected, as the NSMs should move to the other side of 
their base character, and when the base character is a surrogate, 
we should take care of it.

> If we add to that the fact that ALL surrogated characters (i.e. - all 
> characters whose code point is higher than 0xFFFF) are L (table at 
> http://www.unicode.org/unicode/reports/tr9/#Bidirectional_Character_Types),

No, you are simply wrong, there are already EN, BN characters 
there, and should other types get encoded.  The page you are 
refering states that all un-assigned chars in this region are L.

> I don't think my original suggestion of a change needs amendment 
> (barring the warning at the bottom: "Unassigned characters are given 
> strong types in the algorithm. This is an explicit exception to the 
> general Unicode conformance requirements with respect to unassigned 
> characters. As characters become assigned in the future, these 
> bidirectional types may change.").

I can't really understand what you mean.  The unassigned char's 
types may change, well they get assigned in future.

> Behdad, I think I'm missing something here. I was using version 10 of 
> the 3.2 standard 
> (http://www.unicode.org/unicode/reports/tr9/tr9-10.html). The rule 
> numbers seem a bit wrong, and the quotes you give do not appear at all.

Ok, see above.  I prefer the latest version always available at:
http://www.unicode.org/unicode/reports/tr9
which is the same as your accidenatlly ;).

>                     Shachar

-- 
Behdad Esfahbod		4 Shahrivar 1381, 2002 Aug 26 
http://behdad.org/	[Finger for Geek Code]

#define is_persian_leap(y) ((((y)-474)%2820+2820)%2820*31%128<31)