String literals, ASCII vs UTF-8

Lubos Lunak l.lunak at suse.cz
Tue Feb 28 03:30:00 PST 2012


 I'd like to revisit the choice of considering string literals to be either 
ASCII or UTF-8, as discussed in the thread about removing 
RTL_CONSTASCII_USTRINGPARAM. While I was ambivalent about it, I now think we 
should go with ASCII only, unless explicitly marked otherwise.

 The reason for this is that I have patches adding more functions taking 
string literals and there it makes much more sense to require only ASCII. For 
example OUString::operator== can be simply a call to OUString::equalsAsciiL() 
for ASCII, but for UTF-8 it requires a conversion and unicode comparison.

 Given that we are talking about this as a followup to removal of 
RTL_CONSTASCII_USTRINGPARAM, which requires those string literals to be ASCII 
anyway, and non-ASCII string literals should be fairly rare given LO's UI 
strings are not written directly in the C++ source files, this should be 
fairly safe and worth it. Additionally all *ascii*() functions can have 
checks for the contents being in the allowed range, and it's also fairly easy 
to check all .cxx files for non-ASCII characters. UTF-8 literals, if needed, 
can be still converted using a simple function (I don't know if there's 
already something, but e.g. OUString::fromUtf8() could be easily added if 
needed).

 For these reasons I'd like to push the attached patches (changing OUString 
docs to require ASCII in plain string literals, adding ASCII range checks, 
adding OUString operator overloads for string literals and cleaning up usage 
of the macro in sw's ww8 filter as a test).

PS: Any idea why ' OUString foo() { return "foo";} ' does not work, even 
though the ctor is not explicit? I can't recall a reason why a return value 
would need to be different from the other cases.

-- 
 Lubos Lunak
 l.lunak at suse.cz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-assume-string-literals-to-be-ASCII-only-rather-than-.patch
Type: text/x-diff
Size: 2533 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0005.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-add-since-to-new-OUString-ctors.patch
Type: text/x-diff
Size: 946 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0006.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-check-that-ascii-string-functions-are-really-passed-.patch
Type: text/x-diff
Size: 6262 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0007.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-remove-RTL_CONSTASCII_USTRINGPARAM-usage.patch
Type: text/x-diff
Size: 74366 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0008.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-optimized-OUString-operators-for-string-literals.patch
Type: text/x-diff
Size: 4567 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120228/ea781d92/attachment-0009.patch>


More information about the LibreOffice mailing list