Obsoleting RTL_CONSTASCII_USTRINGPARAM

Lubos Lunak l.lunak at suse.cz
Thu Feb 9 09:16:46 PST 2012


On Thursday 09 of February 2012, Lubos Lunak wrote:
> On Thursday 09 of February 2012, Michael Meeks wrote:
> > 	Personally, I'd -love- someone to rename ~all of these
> > RTL_USTR() or RTL_STR() or somesuch - we have enough pointlessly hard to
> > read and indent coding around the place. Hey - we could even have an:
> >
> > 	RTL_USTRING("foo")
> >
> > 	that hid all the:
> >
> > 	rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("foo"))
> >
> > 	madness from sight ;-)
>
>  It'd be probably almost the same amount of work to get rid of the macro
> completely and add a special ctor to handle string literals (can you
> arrange a HackWeek for me soon ;) ? ).

 Actually, no need for a week, this one is reasonably simple. Therefore I 
challenge RTL_CONSTASCII_USTRINGPARAM to a fight to the death. Its death, 
that is.

 Attached is a patch that removes the need for RTL_CONSTASCII_USTRINGPARAM 
(henceafter called 'macro', because it annoys me to write this even in its 
death warrant) when using the OUString ctor. Assuming this code compiles also 
with MSVC[*], the result should be at least as good as when using the macro, 
except that it is not needed to use the macro.

 Technically, the trick is to use "const char (&)[ N ]' , which ensures that 
the parameter passed really will be a const char array, which either will be 
a string literal, or a real const char array, which in turn is unlikely to be 
anything else than a string literal. If OUString had also "const char*" ctor, 
then compiler would actually prefer that one to instantiating the template, 
but luckily that's not the case.

 Note that the patch only removes the need of the macro for ctor calls, but 
there can be further changes that allow string literals for other methods. 
That would allow deprecating the C-style "overloads" fooAsciiL() and using 
proper overloads.

 There are few questions about this:

- Technical flaws: As can be seen in the unittest, the only problem is 
having "const char [][ N ]" array, where the length for all literals will be 
considered to be N-1 and therefore possibly wrong. This is however not used, 
and SAL_N_ELEMENTS() (and thus the macro) has the same flaw, so I don't 
consider this to be a problem in practice.

- OString is not included in the patch, as that one does have "const char*" 
ctor. Since it is inline, it actually does not matter for binary 
compatibility, so I can change this, but I didn't want to do this now before 
I get some feedback. It should be later changed as well.

- The String class in tools is obsolete AFAIK, and so I do not see the need to 
bother with that one.

- I added an explicit check that when converting from an 8-bit string there is 
no \0 inside the string (i.e. that the length really matches), but the check 
actually triggers a couple of times. One case is editeng's ImpEditEngine 
ctor, where I'd expect the problem is real and somebody thought "\0xff" 
was "\xff", but I don't know for sure. Other case is sal's 
test_oustring_endswith.cxx , which intentionally creates OStrings with \0's 
in them and even operates on them, which suggests that OString in fact does 
not stop at \0's when handling strings. I'm kind of baffled by this, I 
personally consider it a flaw that a string class does this, but the unittest 
is very explicit about this. Does somebody know?

- The biggest question probably : I have not made the ctor explicit. That 
means that the string literal to OUString conversion is automatic.

 I however do not see anything wrong with automatic string literal to OUString 
conversions. UI strings, AFAICS, are not included in .cxx files themselves, 
but come from elsewhere, so sources contain only technical strings. Therefore 
it is a very fair assumption that all string literals are ASCII, since 
anything outside of ASCII would presumably need translation anyway, and the 
few legal cases (math symbols not in ASCII maybe) can simply be explicit 
about it (there will be a runtime warning if this is not the case, and since 
it is a string literal, it should be noticed by the developer). Since 
non-const data will not trigger this automatic conversion, the room for any 
problems caused by the automatic conversion should be minimal and far 
outweighted by the convenience. I mean, this way it can be possible to just 
do things like

 foo( "test", 10, "bar" )  or str = "<tag>"

 instead of 

 foo( OUString( "test" ), 10, OUString( "bar" )) or str = OUString( "<tag>" )

 Moreover in fact the explicit OUString doesn't really buy us anything, as 
nobody says foo's arguments or 'str' really are OUString, so this is about as 
good as Hungarian notation.

 So while I can be talked into adding the explicit, I can't find any good 
reason for doing that.


 Comments?


[*] Could somebody please try if the patch compiles with MSVC? Just building 
sal/ including the unittest should be enough.

-- 
 Lubos Lunak
 l.lunak at suse.cz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: constascii_deprecate.patch
Type: text/x-diff
Size: 8303 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20120209/b85e17e0/attachment-0001.patch>


More information about the LibreOffice mailing list