[Fribidi-discuss] Re: BiDi WINE status and fribidi
Shachar Shemesh
fribidi-discuss at sun.consumer.org.il
Sun Aug 25 11:25:02 EST 2002
Behdad Esfahbod wrote:
>On Sun, 25 Aug 2002, Shachar Shemesh wrote:
>
>
>
>>Not looking at the source yet, I may be talking utter bullshit here.
>>
>>What I think will be the proper thing to do is to change two locations:
>>1. When classifying the characters, do the surrogate unification, lookup
>>the combined code point, and then mark both parts of the surrogate as
>>the same type.
>>2. When reordering, if the level is odd (right to left), and the char is
>>a surrogate, don't change the order of the pair.
>>
>>As far as I can see, these are the only changes required in order to
>>support UTF-16. They can be relatively trivially extended to support UTF-8.
>>
>>If anyone who has actually had a look at the code has anything to
>>correct me, please do.
>>
>> Shachar
>>
>>
>
>You are simplifying things too much. The two easy steps are what
>you told. But there are some harder ones two:
>
> * Rule W5: States that if there is just *one* char of
>type .... Then you should be aware that a surrogate pair is
>just one character, not two.
>
> * Rule L3: NSMs in RTL levels should be reordered to
>come after their base, now the problem is that both the NSM can
>be a surrogate pair, and the base can be a surrogate.
>Headache...
>
>
So that's why we pay you - to know those things (what do you mean you
havn't gotten the cheque. I mailed it myself yesturday!)
>So please please don't talk about UTF-8, thats already enough.
>
The voice of reason. Ok, you are, of course, right.
>
>Yours,
>
>
Ok, let's see.
Since we have accepted my proposal of marking both chars of the
surrogate with the codepoint's type, only rules that apply to a single
letter need any special processing at all.
Let's review them, then:
Rule W4 - European seperator between european numbers. Only the
seperatore is affected.
(Rule W5 discusses a sequence of characters of the same type. Are you
sure it's relevant?).
I have seen no more rules that seem to apply (rule L3 doesn't seem
related to the rule Behdad quoted, and the rule Behdad quoted seem, it
appears, to be covered by the second assumption I originally took. I
suspect I am misunderstanding here).
If we add to that the fact that ALL surrogated characters (i.e. - all
characters whose code point is higher than 0xFFFF) are L (table at
http://www.unicode.org/unicode/reports/tr9/#Bidirectional_Character_Types),
I don't think my original suggestion of a change needs amendment
(barring the warning at the bottom: "Unassigned characters are given
strong types in the algorithm. This is an explicit exception to the
general Unicode conformance requirements with respect to unassigned
characters. As characters become assigned in the future, these
bidirectional types may change.").
Behdad, I think I'm missing something here. I was using version 10 of
the 3.2 standard
(http://www.unicode.org/unicode/reports/tr9/tr9-10.html). The rule
numbers seem a bit wrong, and the quotes you give do not appear at all.
Shachar
More information about the FriBidi
mailing list