How to measure effects of OUString::intern ?

Mark Wielaard mark at klomp.org
Tue Aug 6 05:11:02 PDT 2013


On Sat, Aug 03, 2013 at 09:35:30AM +0200, Stephan Bergmann wrote:
> >I decided to add some SDT probes at the RTL_LOG_STRING_NEW/DELETE
> >locations. With a little preprocessor magic you can easily split it
> >in 8bit and 16bit variants (so this introduces probes for new_string_8,
> >new_string_16, delete_string_8, delete_string_16 with as arguments the
> >string, the refcount, length and buffer). The SDT probes have zero
> >overhead (just adds a NOP and a special ELF NOTES section to the binary)
> >and can be used with gdb or systemtap to collect some statistics
> >(or simply just log/print all strings created/deleted). They are also
> >source compatible with dtrace for systems that supports that (I have never
> >used dtrace so I hope someone could test that). Could you have a look if
> >that looks fine to integrate: https://gerrit.libreoffice.org/5256
> 
> Cool.  Pushed.  Thanks,

Thanks for integrating it. With a bit more scripting it can give some
hints of which strings are duplicated a lot. Take the following systemtap
script:

=== dups.stp ===

global strings[50000], dups[20000];

probe process("install/ure/lib/libuno_sal.so.3").mark("new_string_16")
{
  /* If ref count is 1, then it is a non-shared new string. */
  if ($arg2 == 1)
    {
      str = user_string_utf16($arg4);

      /* Element not found in array returns zero. */
      if (strings[str] != 0)
        dups[str] += $arg3; /* Add size as "weight". */

      strings[str]++;
    }
}

probe process("install/ure/lib/libuno_sal.so.3").mark("delete_string_16")
{
  /* deleting probe is only called when ref count goes to zero. */
  str = user_string_utf16($arg4);
  strings[str]--;
  if (strings[str] == 0)
    {
      delete strings[str]; /* Remove from array to save space. */
      // delete dups[str]; /* Remove if we want to "start over". */
    }
}


probe end
{
  foreach(nr=[str] in dups- limit 30)
    printf("'%s': %d (%d * %d)\n",
           text_str(str), nr, nr / strlen(str), strlen(str));
}

===

That counts the strings (and how much chars are duplicated) for which
a new string (ref count == 1) is created when the same string already
exists. You can count the dups over the whole lifetime (even if at some
point all duplicates are deleted) or only the duplicates that survive
till the end (see the commented out line // delete dups[str];)

The first gives:
$ stap dups.stp -c "install/program/soffice /home/mark/Documents/tables/tables.*"

'en-US': 82620 (16524 * 5)
'com.sun.star.uno.RuntimeException': 41217 (1249 * 33)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/..': 40662 (502 * 81)
'en': 36272 (18136 * 2)
'com.sun.star.uno.XInterface': 35775 (1325 * 27)
'com.sun.star.lang.IllegalArgumentException': 34776 (828 * 42)
'file://': 34069 (4867 * 7)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/sofficerc': 33352 (379 * 88)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/../program/bootstraprc': 31411 (311 * 101)
'.uno:': 30070 (6014 * 5)
'/usr/local': 29280 (2928 * 10)
'US': 29272 (14636 * 2)
'com.sun.star.uno.XInterface::queryInterface': 29154 (678 * 43)
'com.sun.star.util.URLTransformer': 25408 (794 * 32)
'com.sun.star.lang.WrappedTargetException': 24920 (623 * 40)
'com.sun.star.uno.XInterface::release': 24480 (680 * 36)
'com.sun.star.uno.XInterface::acquire': 24408 (678 * 36)
'DocumentService': 23055 (1537 * 15)
'/usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/sofficerc': 22842 (282 * 81)
'queryInterface': 21812 (1558 * 14)
'void': 21028 (5257 * 4)
'com.sun.star.ucb.UniversalContentBroker': 20982 (538 * 39)
'com.sun.star.container.NoSuchElementException': 18675 (415 * 45)
'DetectService': 18031 (1387 * 13)
'com.sun.star.frame.ModuleManager': 17792 (556 * 32)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/unorc': 17640 (210 * 84)
'/usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/../program/bootstraprc': 17484 (186 * 94)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/../program': 17088 (192 * 89)
'com.sun.star.graphic.GraphicProvider': 16416 (456 * 36)
'liberation Serif': 16352 (1022 * 16)

The second gives:

$ stap dups.stp -c "install/program/soffice /home/mark/Documents/tables/tables.*"

'com.sun.star.uno.RuntimeException': 41217 (1249 * 33)
'com.sun.star.uno.XInterface': 35775 (1325 * 27)
'com.sun.star.lang.IllegalArgumentException': 34776 (828 * 42)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/sofficerc': 33352 (379 * 88)
'/usr/local': 28500 (2850 * 10)
'com.sun.star.lang.WrappedTargetException': 24920 (623 * 40)
'queryInterface': 21812 (1558 * 14)
'com.sun.star.uno.XInterface::queryInterface': 21586 (502 * 43)
'void': 21028 (5257 * 4)
'com.sun.star.container.NoSuchElementException': 18675 (415 * 45)
'com.sun.star.uno.XInterface::release': 18144 (504 * 36)
'com.sun.star.uno.XInterface::acquire': 18072 (502 * 36)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/../program': 16465 (185 * 89)
'liberation Serif': 16352 (1022 * 16)
'file:///usr/local/src/libreoffice/solver/unxlngx6.pro/installation/opt/program/unorc': 15372 (183 * 84)
'liberationSerif': 15330 (1022 * 15)
'org.openoffice.Office.UI.WindowState:WindowStateType': 15236 (293 * 52)
'string': 14268 (2378 * 6)
'Liberation Serif': 12736 (796 * 16)
'org.openoffice.Office.Common': 11900 (425 * 28)
'com.sun.star.beans.UnknownPropertyException': 11739 (273 * 43)
'acquire': 10892 (1556 * 7)
'release': 10892 (1556 * 7)
'[]com.sun.star.beans.PropertyValue': 10132 (298 * 34)
'org.openoffice.Office.Histories:HistoryInfo': 9933 (231 * 43)
'org.openoffice.Setup': 9800 (490 * 20)
'com.sun.star.lang.XEventListener': 9600 (300 * 32)
'Name': 9280 (2320 * 4)
'Type': 8804 (2201 * 4)
'com.sun.star.io.IOException': 8721 (323 * 27)

As you can see the "locale" duplicate strings ('en-US', 'en', 'US') are
only duplicated (a lot of times) temporarily. Reducing those will make
top-memory-usage lower. But some of the files/urls and "types" are
duplicated over the whole lifetime of the process. And so should lower
overall memory usage. Also note the font names (used in the test documents)
being duplicated a lot.

The issue with the files/urls seems to be that they get translated back
and forth all the time. Causing new strings to be created. And to make
matters worse rtl_uriEncode and rtl_uriDecode will always create a new
string (even if en/decoding results in the same string).

Most other duplicate strings seem to be duplicate types.
Detecting where these come from is most easily done with gdb (systemtap
can also do userspace backtracing, but for libreoffice with lots of
separately loaded modules/shared libraries it is somewhat harder).

$ make debugrun
(gdb) start
(gdb) break -probe new_string_16
Breakpoint 2 at 0x2aaaaaadec18 (21 locations)
(gdb) condition 2 $_probe_arg2 == 1 && $_probe_arg2 == 33

The above puts a condition on the ref count and length of the string.
Note that confusingly gdb starts counting probe arguments at zero
while stap starts counting at one...
You might also be able to use the python translators to just
match the whole string. But using probes (and conditions) in gdb
seems to make gdb incredibly slow. So a little experimenting
to keep things fast enough might be needed. Put a break near where
you expect the duplicates are created and only then put a break -probe.

A lot of the duplicated "type strings" seem to come from the typelib
static_types.

I haven't actually done anything to reduce the duplications.
But hopefully the analysis is interesting.

Cheers,

Mark


More information about the LibreOffice mailing list