About Strings

Michael Stahl mstahl at redhat.com
Mon Mar 19 07:05:45 PDT 2012


On 19/03/12 09:50, Enrico Weigelt wrote:
> Hi folks,
> 
> 
> I just went into this list and wonder why lo has its own string
> implementation instead of using std::string. Could anyone
> please give me some insight ?

the historical reason is that std::string is part of the C++ standard
library, and implementations of that generally only started to become
actually usable on all platforms in the early 2000s.

the other reason is that std::string isn't particularly well suited to
storing Unicode, because all of the operations on it are defined in
terms of elements of an array.  so when using std::string accessing
UTF-8 characters is both cumbersome and error-prone, while using
std::wstring is not an option at all because it is based on wchar_t, and
the geniuses didn't define whether that is 16bit or 32bit, so it's
_completely_ useless in practice.  the only option really would be to
use something like std;:basic_string<uint32_t>, but that wastes a lot of
space in the common case, or std::basic_string<uint16_t>, which is just
as stupid as our existing rtl::OUString.

hmm... what i'd really like from a string class is that it has no
operator[] at all (who needs that anyway), just an iterator interface
that returns characters as uint32_t, and another interface to write the
string UTF-8 encoded into some buffer, thus allowing for picking
whatever internal implementation is most suitable.



More information about the LibreOffice mailing list