optimising OUString for space
sbergman at redhat.com
Mon Oct 1 07:05:03 PDT 2012
On 10/01/2012 02:45 PM, Michael Stahl wrote:
> On 01/10/12 14:23, Noel Grandin wrote:
>> On 2012-10-01 13:58, Michael Stahl wrote:
>>> The only problem with a change there is our ABI - which explicitly
>>> exposes the encoding of that.
>>> the right time to do it is for LO4. sadly nobody has signed up for that
>>> yet :( ... (while there are volunteers for far sillier proposals, like
>>> getting rid of com.sun.star...)
>> Perhaps we need to split out some preparatory tasks?
>> For example
>> - fix code that directly accesses the underlying buffer
>> - create an external iterator class (which would currently be a thin
>> wrapper around int) for looping over the buffer and indexing into it
>> - fix code that indexes into an OUString to use the new external iterator
> there already exists method iterateCodePoints, using a pointer to the
> next code unit as the iterator (note that this interface depends on
> immutability of the buffer):
>> inline sal_uInt32 iterateCodePoints(
>> sal_Int32 * indexUtf16, sal_Int32 incrementCodePoints = 1) const
> problem is, nobody is using it...
> guess you could comment out operator, that should find lots of
> convertible call sites :)
Note that in the common case of accessing (i.e., searching for, etc.)
7-bit ASCII content in a string, regardless of whether it is internally
represented as UTF-8 or UTF-16, going via an operator interface that
operates directly on the string object's innards might be more efficient
than going via an iterator interface (which is, of course, necessary
when potentially accessing non-ASCII content).
What an ideal string abstraction would look like is not clear to me at all.
More information about the LibreOffice