Obsoleting RTL_CONSTASCII_USTRINGPARAM

Fri Feb 10 02:30:56 PST 2012

On Friday 10 of February 2012, Stephan Bergmann wrote:
> On 02/09/2012 06:16 PM, Lubos Lunak wrote:
> >   Attached is a patch that removes the need for
> > RTL_CONSTASCII_USTRINGPARAM (henceafter called 'macro', because it annoys
> > me to write this even in its death warrant) when using the OUString ctor.
> > Assuming this code compiles also with MSVC[*], the result should be at
> > least as good as when using the macro, except that it is not needed to
> > use the macro.
>
> We tried this back in the day, like maybe ca. 2000, and it did not work
> back then.  Can't remember the details---hopefully it was just old MSVC
> that failed to get it and the MSVCs we rely on today have become better.

 I know for sure that Qt in the past didn't use member templates because of 
still supporting MSVC versions that couldn't handle this, but I think that 
does not apply anymore. Still, a change like this should definitely be 
checked.

> > - OString is not included in the patch, as that one does have "const
> > char*" ctor. Since it is inline, it actually does not matter for binary
> > compatibility, so I can change this, but I didn't want to do this now
> > before I get some feedback. It should be later changed as well.
>
> I would assume there's a large number of implicit and explicit uses of
> OString(char const *) with non-literal arguments.

 I can detect the difference, that's not a big problem (the "const char*" ctor 
can be replaced by "const char* (&)" and necessary variants).

> >   I however do not see anything wrong with automatic string literal to
> > OUString conversions. UI strings, AFAICS, are not included in .cxx files
> > themselves, but come from elsewhere, so sources contain only technical
> > strings. Therefore it is a very fair assumption that all string literals
> > are ASCII, since anything outside of ASCII would presumably need
> > translation anyway, and the few legal cases (math symbols not in ASCII
> > maybe) can simply be explicit about it (there will be a runtime warning
> > if this is not the case, and since it is a string literal, it should be
> > noticed by the developer).
>
> We could change the ctor from using _ASCII_US to _UTF8.  That way, the
> occasional non-ASCII literal could be included as \xXX\xXX\xXX.

 I think there are only very few cases of UTF8 literals, if any, and those 
could be handled by something like OUString::fromUtf8() and the already 
existing check for the contents really being in the ASCII range. Using ASCII 
here is optimizing for the common case.

 But if it is agreed that string literals can normally be UTF8, the conversion 
function for UTF8 conversions checks the contents for being actually ASCII, 
so I have no problem trading that little overhead for safety if that's the 
consensus.

> This is IMO orthogonal to the question of explicit-ness.

 Right.

> >   Moreover in fact the explicit OUString doesn't really buy us anything,
> > as nobody says foo's arguments or 'str' really are OUString, so this is
> > about as good as Hungarian notation.
>
> Don't understand the above paragraph.

 I was in advance countering the expected argument that

 foo( OUString( "bar" ))

 makes it obvious what is going on. It does not, as foo() can e.g. take an 
argument that has implicit OUString conversion, so this is just like 
Hungarian notation, more typing for the false sense of knowing what actually 
happens.

> >   So while I can be talked into adding the explicit, I can't find any
> > good reason for doing that.
>
> Apart from the argument given by Caolán (which suggests we'd probably
> need to go with explicit, anyway), my gut feeling is that it would be
> better to keep things explicit here.  There's already too many function
> overloads and implicit conversions around wrt to strings to not get the
> bad feeling that each additional one will subtly change some glorious
> code somewhere.

 I've already fixed the technical need for explicit in that case, and I do not 
see other technical reasons for this. The conversion is lossless, so there 
shouldn't be any problem of data corruption. As for conversions happening, 
they will happen anyway, regardless of whether done explicitly or not, and 
code without them being explicit may in fact have less of them. Can you think 
of some specific example of what might go wrong with the conversions being 
implicit?

 I'd like to also point out that Qt's QString has always had implicit ctors, 
and I don't think there's ever been any trouble with that besides broken 
encodings (which are not going to be the case here), so I think that QString 
shows that it can work.

> > +    template< int N >
>
> nit: having to pick among int, std::size_t, sal_Int32 for the type of N,
> I would probably use std::size_t

 I build a nice small collection of various examples showing how unsigned and 
sal_intXX types just cause more problems than they solve, for the time when 
I'll feel like bringing that topic up again, and I already have specific 
cases of some experienced and respected LO developers getting caught by that. 
So although I'd feel honored to be in such a good company :), I'll stick with 
my argument that plain int in general just does the job.

> > +#ifndef RTL_STRING_UNITTEST
> > +        error_non_const_char_to_OUString_conversion_is_not_automatic(
> > value );
> > +#else 
>
> I would turn this into just a declaration, with missing definition, for
> the non-RTL_STRING_UNITTEST case.  (Unless we make the other ctor
> explicit, in which case I would think this one can go away completely.)

 Must be a leftover from when I was fixing the unittest to build, thanks. 
Fixed in the updated version in the reply to Caolan's mail.

-- 
 Lubos Lunak
 l.lunak at suse.cz