How to measure effects of OUString::intern ?

Mark Wielaard mark at klomp.org
Mon Jul 29 14:59:55 PDT 2013


On Mon, Jul 29, 2013 at 05:53:21PM +0100, Michael Meeks wrote:
> > I couldn't immediately find the duplication of the names.
> > In this case the strings are the full zip file entry paths. e.g.
> > "sw/res/sidebar/pageproppanel/portraitcopy_24x24.png"
> 
> 	Riight - that's interesting :-) IIRC in the past there were two chunks
> of code in package/ that duplicated those names (I think). The fragment
> from the (AMD) report from December 2006 shows:
> 
> 	'package' zip code 
> 		-1022k
> 		+500k
> 	reading the large images.zip file creates a huge hash 
>         table with lots of duplicated string stems – 3 days
> 
> 	Of course, I couldn't tell you if this is still the case; possibly
> we're no longer duplicating those strings in that way. The problem was
> around 'images.zip' - the archive that has all of our icons in it for
> the UI - at least back ~7 years ago ;-)

That seems to make sense that this is about image paths. Most paths seem
to come from opt/share/config/images.zip. But that file contains 3800+
entries and only a few seem to be reused later.

> > And as far as I can see all the full path names are unique, so no
> > actual sharing is taking place here. But is there a place where these
> > strings are reused (and also interned)?
> 
> 	Interesting; of course - we can dump the contents of the interned table
> to see if they have ref-count 1 quite simply (?). 
> 
> > Replacing the intern with a normal OUString constructor like:
> ...
> > Seems to save ~200K of memory at least for a quick:
> 
> 	Nice :-) well - we should just do that then :-)

I am tempted to. Will do some more testing first to make sure I am not
missing something.

> > But that might be too quick to see any effects of this intern action.
> 
> 	The reason it was added was for images.zip - if the package code has
> improved then we should take & save that space/time.

I haven't yet found the code which references the image/resources
maybe it needs interning itself. But it certainly looks like the current
code is a bit too eager interning everything.
 
> > So I guess my general question is how to measure the effects of
> > OUString::intern?
> 
> 	I'd dump the ref-count + string contents of the intern table to see if
> there is more wasteage.

I'll try that next. For now I used systemtap which happens to have utf16
user string support. It looks all interned strings go through the function
rtl_ustring_intern_internal. So probing that and printing the string gives
an interesting overview.

$ stap -e 'probe process("./solver/unxlngx6.pro/lib/libuno_sal.so").function("rtl_ustring_intern_internal") { log("interning: ". $str$$ . " " . user_string_utf16($str->buffer)); }' -c ./install/program/soffice

interning: {.refCount=1, .length=9, .buffer=[108, ...]} links.txt
interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_16.png
interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_32.png
interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/sx03251.png
interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/lx03251.png
interning: {.refCount=1, .length=18, .buffer=[99, ...]} cmd/lc_openurl.png
interning: {.refCount=1, .length=20, .buffer=[99, ...]} cmd/lc_adddirect.png
interning: {.refCount=1, .length=17, .buffer=[99, ...]} cmd/lc_newdoc.png
[...]

That shows (full output attached if the mailinglist allows that) interning
(at least during startup) is done 4192 times. Only 128 strings are reused.
And only 6 are interned 5 times or more:

    115  Regular
     57  Bold
     14  Bold Italic
     13  Italic
      5  Light
      5  Book

> 	You saw the OUString debugging code: RTL_LOG_STRING_NEW /
> _STRING_DELETE etc. that can produce a long but crunch-able set of
> printfs on stdout: many of which are sadly not that useful due to
> OUStringBuffer mutation (IIRC - but presumably some more work could
> clean that up).

I hadn't seen that yet, but that might be useful to see which strings
are recreated multiple times and so are candidates for interning.
Is there already code to enable/trigger RTL_LOG_STRING_NEW?
Or should I just write my own hooks?

Thanks,

Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: interned.out.bz2
Type: application/x-bzip2
Size: 20373 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20130729/af91ca15/attachment.bin>


More information about the LibreOffice mailing list