String literals, ASCII vs UTF-8
Lubos Lunak
l.lunak at suse.cz
Tue Feb 28 03:30:00 PST 2012
I'd like to revisit the choice of considering string literals to be either
ASCII or UTF-8, as discussed in the thread about removing
RTL_CONSTASCII_USTRINGPARAM. While I was ambivalent about it, I now think we
should go with ASCII only, unless explicitly marked otherwise.
The reason for this is that I have patches adding more functions taking
string literals and there it makes much more sense to require only ASCII. For
example OUString::operator== can be simply a call to OUString::equalsAsciiL()
for ASCII, but for UTF-8 it requires a conversion and unicode comparison.
Given that we are talking about this as a followup to removal of
RTL_CONSTASCII_USTRINGPARAM, which requires those string literals to be ASCII
anyway, and non-ASCII string literals should be fairly rare given LO's UI
strings are not written directly in the C++ source files, this should be
fairly safe and worth it. Additionally all *ascii*() functions can have
checks for the contents being in the allowed range, and it's also fairly easy
to check all .cxx files for non-ASCII characters. UTF-8 literals, if needed,
can be still converted using a simple function (I don't know if there's
already something, but e.g. OUString::fromUtf8() could be easily added if
needed).
For these reasons I'd like to push the attached patches (changing OUString
docs to require ASCII in plain string literals, adding ASCII range checks,
adding OUString operator overloads for string literals and cleaning up usage
of the macro in sw's ww8 filter as a test).
PS: Any idea why ' OUString foo() { return "foo";} ' does not work, even
though the ctor is not explicit? I can't recall a reason why a return value
would need to be different from the other cases.
--
Lubos Lunak
l.lunak at suse.cz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-assume-string-literals-to-be-ASCII-only-rather-than-.patch
Type: text/x-diff
Size: 2533 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0005.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-add-since-to-new-OUString-ctors.patch
Type: text/x-diff
Size: 946 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0006.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-check-that-ascii-string-functions-are-really-passed-.patch
Type: text/x-diff
Size: 6262 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0007.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-remove-RTL_CONSTASCII_USTRINGPARAM-usage.patch
Type: text/x-diff
Size: 74366 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0008.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-optimized-OUString-operators-for-string-literals.patch
Type: text/x-diff
Size: 4567 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0009.patch>
More information about the LibreOffice
mailing list