Renaming sal_Unicode to a less misleading name?

Norbert Thiebaud nthiebaud at gmail.com
Sat Feb 13 15:21:05 UTC 2016


On Sat, Feb 13, 2016 at 6:17 AM, Khaled Hosny <khaledhosny at eglug.org> wrote:
> I’m wondering if it is possible to rename sal_Unicode, which is actually
> a unsigned 16 bit integer and thus can’t fit any Unicode character

s/any/every/ ?

>, to
> some less confusing name like sal_Ucs2 or even just use sal_uInt16 (but
> please not sal_Utf16 which would give the illusion that surrogate pairs
> and handled in a special way which I don’t think is true).

Actually surrogate pairs are supported (or at least the intent is there :-) ).

include/rtl/character.hxx:inline bool isHighSurrogate(sal_uInt32 code) {
include/rtl/character.hxx:inline bool isLowSurrogate(sal_uInt32 code) {
include/rtl/character.hxx:inline sal_Unicode getHighSurrogate(sal_uInt32 code) {
include/rtl/character.hxx:inline sal_Unicode getLowSurrogate(sal_uInt32 code) {
include/rtl/character.hxx:inline sal_uInt32
combineSurrogates(sal_uInt32 high, sal_uInt32 low) {
etc...

So yeah sal_Unicode is utf16, and it is quite common that utf16 is
abusively called 'unicode'.
'Unicode' itself does not denote any specific encoding structure,
hence utf-8, utf-16 and utf32 names, the latter 2 coming in BE and LE
flavour.

>
> I count only ~7000 usages across the code base, so that is not such a
> huge task.
Internally it is doable, externally that is more of a problem, since
sal_Unicode is part of the stable external API.
The best you can do is to have an internal 'alias' for it.
It may be indeed useful, for more clarity, to have typedef to be
explicit about things, sal_utf8, sal_utf16, sal_utf16be, sal_utf16le,
sal_utf32, sal_utf32be, sal_utf32le

Norbert


More information about the LibreOffice mailing list