optimising OUString for space
Stephan Bergmann
sbergman at redhat.com
Mon Oct 1 07:05:03 PDT 2012
On 10/01/2012 02:45 PM, Michael Stahl wrote:
> On 01/10/12 14:23, Noel Grandin wrote:
>> On 2012-10-01 13:58, Michael Stahl wrote:
>>> The only problem with a change there is our ABI - which explicitly
>>> exposes the encoding of that.
>>> the right time to do it is for LO4. sadly nobody has signed up for that
>>> yet :( ... (while there are volunteers for far sillier proposals, like
>>> getting rid of com.sun.star...)
>>
>> Perhaps we need to split out some preparatory tasks?
>> For example
>> - fix code that directly accesses the underlying buffer
>> - create an external iterator class (which would currently be a thin
>> wrapper around int) for looping over the buffer and indexing into it
>> - fix code that indexes into an OUString to use the new external iterator
>
> there already exists method iterateCodePoints, using a pointer to the
> next code unit as the iterator (note that this interface depends on
> immutability of the buffer):
>
>> inline sal_uInt32 iterateCodePoints(
>> sal_Int32 * indexUtf16, sal_Int32 incrementCodePoints = 1) const
>
> problem is, nobody is using it...
>
> guess you could comment out operator[], that should find lots of
> convertible call sites :)
Note that in the common case of accessing (i.e., searching for, etc.)
7-bit ASCII content in a string, regardless of whether it is internally
represented as UTF-8 or UTF-16, going via an operator[] interface that
operates directly on the string object's innards might be more efficient
than going via an iterator interface (which is, of course, necessary
when potentially accessing non-ASCII content).
What an ideal string abstraction would look like is not clear to me at all.
Stephan
More information about the LibreOffice
mailing list