Possible extensions to OUString class

Thu Jan 31 07:04:49 UTC 2019

Hi Stephan,

On 1/30/19 10:40 PM, Stephan Bergmann wrote:
> On 30/01/2019 22:17, Matteo Casalin wrote:
>>      I'm working on improving code that calls getToken (e.g. using its 
>> version with index, or using other OUString functions in its place 
>> when possible).
>> One thing that I noticed is that there are a lot of calls in the form 
>> getToken().toInt# which require memory management just to obtain a 
>> value that could be generated by the original OUString. Similarly (but 
>> less frequently), some tokens are extracted just to compare them 
>> against a string, which again requires memory management that is 
>> really not needed.
>>
>> I was wondering if extending O(U)String with functions like:
>>
>> * getTokenAs[U]Int#(token, sep, index)
>> * matchToken(token, sep, index, string)
>>
>> would be accepted/appreciated or not. At the moment I already 
>> submitted to gerrit a patch [1] which adds 
>> comphelper::string::matchToken but I think that adding such 
>> functionality to OUString directly would be nicer. Also, introducing 
>> getTokenAsInt in OUString would likely allow to reuse its toInt code.
> 
> Sounds a bit too special-purpose to be worth adding, IMO.  Would those 
> optimizations really make a measurable difference?

I don't have real numbers to provide, but a very rough check on getToken 
provides the following numbers:

git grep -w getToken > getToken.txt
grep -wc getToken getToken.txt ==> 1646
grep -wc toInt32 getToken.txt ==> 218
grep -wc toInt64 getToken.txt ==> 8
grep -wc toUInt32 getToken.txt ==> 0
grep -wc toUInt64 getToken.txt ==> 8

The number of getToken occurrences is higher that real 
OUString::getToken calls (comments, header files, definitions and also 
not OUString getToken), and I am missing places in which conversion to 
integer is done in a following line. As a result we have that this 
pattern is > 14.2% of all getToken occurrences. I cannot say if this is 
frequently called code or not.

About matchToken, this seems to be a very less frequent pattern and at 
the moment the comphelper approach can provide a viable approach, so I 
woulg go this way (and will take care of reviewing some older getToken 
optimizations that I implemented).

> Also, a better approach overall would probably be some string_view-based 
> getToken functionality (converting from an OUString to a string_view is 
> cheap), and then string_view-based toInt etc. functions.

At the moment I plan to just go through all of getToken uses and do some 
minor local optimizations, then I might have a look at the string_view 
approach (unless previous numbers make the OUString one look not too 
specialised).

Many thanks for your comments
Kind regards
Matteo

> _______________________________________________
> LibreOffice mailing list
> LibreOffice at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/libreoffice