RTL_CONSTASCII_USTRINGPARAM: cleanup wanted?

Wed Feb 22 04:42:54 PST 2012

On 02/22/2012 11:25 AM, Michael Meeks wrote:
> 	Great ! :-) incidentally, I had one minor point around the ASCII vs.
> UTF-8 side; the rtl_string2UString (cf. sal/rtl/source/string.cxx) does
> a typically slower UTF-8 length counting loop; I suggest that we could
> do better performance wise (and we do create a biggish scad of these
> strings) by sticking with ascii, and doing a single, simple copy/expand
> of the string. Perhaps in a new rtl_uString_newFromAsciiL method.

Thinking about it again, the restriction to ASCII could become a 
hindrance in the longer run.  C++11 has provision for UTF-8 string 
literals (u8"..."), but they still have type char const[], so are not 
distinguishable from traditional plain "..." literals via function 
overloading.  So, if we ever wanted to extend the new facilities to also 
support UTF-8 string literals, but would want to keep the performance 
benefit for the ASCII-only case, we could not offer the same simple syntax

   rtl::OUString("foo");
   rtl::OUString(u8"I\u2764C++");

for both.  One solution might be to go via an indirection

   template<std::size_t N> struct A { char const s[N]; }
   template<std::size_t N> struct U { char const s[N]; }

that encodes the knowledge whether the given string literal is ASCII or 
UTF-8, and have rtl::OUString ctors overloaded on those.  Of course, 
this would bring back ugly warts into client code

   rtl::OUString(rtl::A("foo"));
   rtl::OUString(rtl::U(u8"I\u2764C++"));

And of course it would also work to syntactically optimize the ASCII 
case (as we would do now) and add the indirection only for the UTF-8 
case (at the expense of some ugly asymmetry).

Just some thoughts,
Stephan