OUString is mutable?

Dennis E. Hamilton dennis.hamilton at acm.org
Mon Oct 1 09:05:53 PDT 2012

In the WinRT APIs (not to be confused with the Windows RT on Atom) introduced with Windows 8, strings are immutable.  I suspect that hardware features may be exploited to ensure that they stay that way.  

I haven't got my head around how that works with the still-supported BSTR at the native level.  There can be some simple testing to see if access violations are triggered even when the data structure is accessed internally via native code.

In addition to the considerations already listed, there are also security considerations applicable to strings, having to do with their use in native code exploits to access other heap or code storage by addressing beyond or before the string location.  I suppose that genuinely-immutable strings might provide some safeguard against exploits of that nature.

Regarding the mention that the latest Java VM is using UTF8 internally instead of unsigned short arrays is rather daunting.  There is an easy way to test it -- see if char values that are not admissible UTF16 codes can be used in construction of a string and then extracted correctly.  If they can, there is no way that transformation to and from UTF8 occurred.  If they can't, it is an interesting breaking change in Java.  With regard to string literals, it would be interesting to see what can be introduced into those via escape codes too.
 - Dennis

-----Original Message-----
From: libreoffice-bounces+dennis.hamilton=acm.org at lists.freedesktop.org [mailto:libreoffice-bounces+dennis.hamilton=acm.org at lists.freedesktop.org] On Behalf Of Michael Stahl
Sent: Monday, October 01, 2012 06:51
To: libreoffice at lists.freedesktop.org
Subject: Re: OUString is mutable?

[ ... ]

it appears that there are people who do see good reasons for immutable
strings :)

[ ... ]

from what i can see the advantages of immutable strings include:

- somebody reading the code can rely on the fact that the buffer in the
  string is not mutated
- immutable data types are much less error prone as hashtable/map keys
  (you don't want to modify a key after it has been inserted, because
  that is practically guaranteed to violate container's invariants)
- the buffer can be shared across threads (but not the string wrapper
  itself, which makes this .. less of an advantage)
- there are potential space and allocation savings with sharing
  sub-string representations (though OUString doesn't do that)
- it allows for caching the hash value, though OUString doesn't do that
  (and i don't know if space overhead is worth it...)

[ ... ]

More information about the LibreOffice mailing list