optimising OUString for space

Noel Grandin noel at peralex.com
Mon Oct 1 04:02:11 PDT 2012


On 2012-10-01 12:38, Michael Meeks wrote:
> We could do some magic there; of course - space is a bit of an issue - 
> we already pointlessly bloat bazillions of ascii strings into UCS-2 
> (nominally UTF-16) representations and nail a ref-count and length on 
> the beginning. If you turn on the lifecycle diagnostics in 
> sal/rtl/source/strimp.hxx with the #ifdef and re-build sal, you can 
> start to see the scale of the problem when you launch libreoffice ;-)

Changing subject because I'm changing the topic.

That was something I was thinking about the other day - given than the 
bulk of our strings are pure 7-bit ASCII, it might be a worthwhile 
optimisation to store a bit that says "this string is 7-bit ASCII", and 
then store the string as a sequence of bytes.

The latest Java VM does this trick internally - it pretends that String 
is stored with an array of 16-bit values, but actually it stores them as 
UTF-8.

Even in an app running in a language other than US-English, strings are 
used for so many internal things that >90% of the strings are 7-bit ASCII.


Disclaimer: http://www.peralex.com/disclaimer.html




More information about the LibreOffice mailing list