optimising OUString for space

Mon Oct 1 07:05:03 PDT 2012

On 10/01/2012 02:45 PM, Michael Stahl wrote:
> On 01/10/12 14:23, Noel Grandin wrote:
>> On 2012-10-01 13:58, Michael Stahl wrote:
>>> 	The only problem with a change there is our ABI - which explicitly
>>> exposes the encoding of that.
>>> the right time to do it is for LO4.  sadly nobody has signed up for that
>>> yet :( ... (while there are volunteers for far sillier proposals, like
>>> getting rid of com.sun.star...)
>>
>> Perhaps we need to split out some preparatory tasks?
>> For example
>>    - fix code that directly accesses the underlying buffer
>>    - create an external iterator class (which would currently be a thin
>> wrapper around int) for looping over the buffer and indexing into it
>> -  fix code that indexes into an OUString to use the new external iterator
>
> there already exists method iterateCodePoints, using a pointer to the
> next code unit as the iterator (note that this interface depends on
> immutability of the buffer):
>
>>      inline sal_uInt32 iterateCodePoints(
>>          sal_Int32 * indexUtf16, sal_Int32 incrementCodePoints = 1) const
>
> problem is, nobody is using it...
>
> guess you could comment out operator[], that should find lots of
> convertible call sites :)

Note that in the common case of accessing (i.e., searching for, etc.) 
7-bit ASCII content in a string, regardless of whether it is internally 
represented as UTF-8 or UTF-16, going via an operator[] interface that 
operates directly on the string object's innards might be more efficient 
than going via an iterator interface (which is, of course, necessary 
when potentially accessing non-ASCII content).

What an ideal string abstraction would look like is not clear to me at all.

Stephan