Possible extensions to OUString class
Matteo Casalin
matteo.casalin at libreoffice.it
Thu Jan 31 07:04:49 UTC 2019
Hi Stephan,
On 1/30/19 10:40 PM, Stephan Bergmann wrote:
> On 30/01/2019 22:17, Matteo Casalin wrote:
>> I'm working on improving code that calls getToken (e.g. using its
>> version with index, or using other OUString functions in its place
>> when possible).
>> One thing that I noticed is that there are a lot of calls in the form
>> getToken().toInt# which require memory management just to obtain a
>> value that could be generated by the original OUString. Similarly (but
>> less frequently), some tokens are extracted just to compare them
>> against a string, which again requires memory management that is
>> really not needed.
>>
>> I was wondering if extending O(U)String with functions like:
>>
>> * getTokenAs[U]Int#(token, sep, index)
>> * matchToken(token, sep, index, string)
>>
>> would be accepted/appreciated or not. At the moment I already
>> submitted to gerrit a patch [1] which adds
>> comphelper::string::matchToken but I think that adding such
>> functionality to OUString directly would be nicer. Also, introducing
>> getTokenAsInt in OUString would likely allow to reuse its toInt code.
>
> Sounds a bit too special-purpose to be worth adding, IMO. Would those
> optimizations really make a measurable difference?
I don't have real numbers to provide, but a very rough check on getToken
provides the following numbers:
git grep -w getToken > getToken.txt
grep -wc getToken getToken.txt ==> 1646
grep -wc toInt32 getToken.txt ==> 218
grep -wc toInt64 getToken.txt ==> 8
grep -wc toUInt32 getToken.txt ==> 0
grep -wc toUInt64 getToken.txt ==> 8
The number of getToken occurrences is higher that real
OUString::getToken calls (comments, header files, definitions and also
not OUString getToken), and I am missing places in which conversion to
integer is done in a following line. As a result we have that this
pattern is > 14.2% of all getToken occurrences. I cannot say if this is
frequently called code or not.
About matchToken, this seems to be a very less frequent pattern and at
the moment the comphelper approach can provide a viable approach, so I
woulg go this way (and will take care of reviewing some older getToken
optimizations that I implemented).
> Also, a better approach overall would probably be some string_view-based
> getToken functionality (converting from an OUString to a string_view is
> cheap), and then string_view-based toInt etc. functions.
At the moment I plan to just go through all of getToken uses and do some
minor local optimizations, then I might have a look at the string_view
approach (unless previous numbers make the OUString one look not too
specialised).
Many thanks for your comments
Kind regards
Matteo
> _______________________________________________
> LibreOffice mailing list
> LibreOffice at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/libreoffice
More information about the LibreOffice
mailing list