[Libreoffice] [PATCH] refactoring gendict

Norbert Thiebaud nthiebaud at gmail.com
Sun Jan 30 06:15:13 PST 2011


On Sun, Jan 30, 2011 at 7:35 AM, Kenneth Venken
<kenneth.venken at gmail.com> wrote:
>> what's the point passing cont as a parameter if you are going to
>> override it's value right away ? (note: ok so, patch 0005 actually fix
>> that...)
>
> these patches should be viewed as a wholel. The refactoring was a process.
> But you're wright.
Well you broke down the refactoring into multiple step, which is very good
since it make reviewing so much easier, but then I reviewed them one a the time
not as a whole.
In a perfect world (and in reality - in the linux kernel world) each
sucessive patch should
yield a build-able and functional code. You almost did that with this
patch series :-)

>>
>> luckily this is not possible since utf16 characters in the range
>> D800-DBFF are not supported/valid
>> see http://en.wikipedia.org/wiki/UTF-16/UCS-2 for a reason why.
>>
>> so you will have, at least, 4 ranges without any hit.
>>
> So "It is not possible to encode these code points in UTF-16. The Unicode
> standard permanently reserves these values for UTF-16 encoding only, so a
> single 16-bit code unit in the range 0xD800 to 0xDFFF never represents a
> character in Plane 0."
> But since we are using the UTF-16 encoding, it could be possible that a
> sal_Unicode value (a unicode code unit)  is in this range.
> See section about Code points U+10000..U+10FFFF in the link you sent me.

Yeah, but the truth is that we do not _really_ support utf16, for instance the
length of an OUString is the number of 16bits values not the number of
'characters'
iow, a surrogate pair is 1 character but would have a length of 2 in OUString.

all that being said, yes the case you mentioned it is a bug that
should probably be fixed.

Norbert


More information about the LibreOffice mailing list