optimising OUString for space
Michael Meeks
michael.meeks at suse.com
Mon Oct 1 04:25:04 PDT 2012
On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote:
> That was something I was thinking about the other day - given than the
> bulk of our strings are pure 7-bit ASCII, it might be a worthwhile
> optimisation to store a bit that says "this string is 7-bit ASCII", and
> then store the string as a sequence of bytes.
Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8
underneath the hatches anyway. All the people I've discussed this with
that objected to that, turned out (after some discussion) to have a weak
understanding of UTF-8, UTF-16 and of rendering complex text ;-) Of
course, perhaps I should discuss with more people.
The only problem with a change there is our ABI - which explicitly
exposes the encoding of that.
> The latest Java VM does this trick internally - it pretends that String
> is stored with an array of 16-bit values, but actually it stores them as
> UTF-8.
Interesting - for all strings ? is there a pointer to the code / docs
for that detail somewhere ? :-) Last I looked Java also stored partial
strings chained to it's parent; so 'substring' takes a reference on the
parent (be it ever so large), and can return a single character string
out of it without re-allocation. IIRC that can cause huge grief when
parsing big files into little ones ;-)
> Even in an app running in a language other than US-English, strings are
> used for so many internal things that >90% of the strings are 7-bit ASCII.
Sure - so define the define, see what it prints, and do the quick
calculation of how much time/space we save by doing it :-)
Then again - last I looked we still had some real dumbness that needed
hunting down relating to many (tens of?) thousands of allocations and
frees of the "/" string at startup ;-)
ATB,
Michael.
--
michael.meeks at suse.com <><, Pseudo Engineer, itinerant idiot
More information about the LibreOffice
mailing list