pre-init / strings ...

Michael Meeks michael.meeks at collabora.com
Thu Dec 7 21:53:31 UTC 2017


Hi there,

	Its been a long while since I discussed a patch on the dev list - perhaps gerrit killed this pleasure for the most part. But - lets try it. Please reply-to-all, so I can respond in a timely way :-)

	Anyhow - the appended is perhaps worth some discussion. I just fwd. ported it blind from our 5.3 branch; I hope the intention is at least reasonably clear (although the extern "C" stuff is a mess to get around some hard to explain internal linking oddness).

	The basic problem is this - LOOL pre-initializes LibreOffice before forking to ensure that lots of the memory is shared copy-on-write between it and the child processes. That works reasonably well, but unfortunately - when we re-use strings from that corpus - we tend to touch the ref-counts, which causes lots of page dirtying.

	I've written a tool 'loolmap' that shows that - and dumps the memory that is not-shared with the parent process loading a trivial 'hello world' document into writer (from my master). The patch turns our base-line memory use for that from 17.6Mb to 14.1Mb - so saving around 3.5Mb per document; quite a win over a hundred documents eg. So its reasonably useful.

	There are however some questions:

	* a new sal entry point [!] - I guess it is hard to avoid this.

	* should rtl_arena_foreach be public ?
		+ perhaps - but it should be precise first; that would
		  be far easier if the slab allocator was in fact a slab
		  allocator - but could be done anyway.
		+ currently it mashes the free-lists that are stored inside
		  free blocks - so should be the last thing called before not
		  using that arena anymore (matching the pre-init use-case nicely)

	* are rtl_uString's -really- -that- big that they're worth saving ?
		+ here is a loolmap dump of just two:

0x0000  20 00 00 00 00 00 00 00  02 00 00 00 05 00 00 00  4b 00 45 00 59 00 5f 00  55 00 00 00 6f 72 6b 63  |  ...............K.E.Y._.U...orkc
0x0020  20 00 00 00 00 00 00 00  02 00 00 00 05 00 00 00  4b 00 45 00 59 00 5f 00  56 00 00 00 05 00 03 6a  |  ...............K.E.Y._.V......j
     A. 64 bits of memory 'size'| B. ref-cnt| C. str-len | K    E     Y     _      V       | random memory  |

		+ so yes, we really do waste a chunk of RAM here, hence the TODO - in the patch, but then a slow but more efficient packing will save only a few Mb of RAM per-server rather than per-process: where the real wins are.

	* can we do better on strings ?
		+ it seems to me that storing the length of the memory block in 64bits before, and then the length in chars in the immutable string in the middle is rather unfortunate.
		+ unfortunately - OUStringBuffer mutates that length, and the current patch serves both OString and OUString with the same code.
		+ we -could- use rtl_uStringBuffer_makeStringAndClear's inspired pCapacity pointer to get the previous size, and do something clever
		=> that's more work though.

	* can we do better with the mhu slab allocator ?
		+ yes, make it a real slab allocator; if we have the size at free time, we should be able to have a simple bit-mask for the allocated blocks at an offset defined by the size of the objects in the slab.
		=> this would also help for the rtl_arena_foreach being precise & perhaps usefully public.
		+ it would mean we could allocate smaller than 32byte blocks of memory too.

	Anyhow - thoughts appreciated; I'd like to get something that does something like this into master at some stage =)

	ATB,

		Michael.

>From d59e7909d15a150756e134719045e4fca07a48e7 Mon Sep 17 00:00:00 2001
From: Michael Meeks <michael.meeks at collabora.com>
Date: Thu, 7 Dec 2017 21:19:09 +0000
Subject: [PATCH] sal: add pre-initialization scheme for allocations.

This saves several megabytes of dirtied pages for each LOK
client of Online.

Change-Id: I425a2e7896879f0a64d71fcc0655e9e1fa1256aa
---
 desktop/source/lib/init.cxx    |  7 +++++
 include/rtl/alloc.h            | 17 ++++++++++
 sal/qa/rtl/alloc/rtl_alloc.cxx | 70 ++++++++++++++++++++++++++++++++++++++++++
 sal/rtl/alloc_arena.cxx        | 30 ++++++++++++++++++
 sal/rtl/alloc_arena.hxx        |  8 +++++
 sal/rtl/alloc_cache.cxx        | 20 ++++++++++++
 sal/rtl/strimp.cxx             | 68 ++++++++++++++++++++++++++++++++++++++++
 sal/rtl/strimp.hxx             |  6 ++++
 sal/rtl/string.cxx             |  6 ++--
 sal/rtl/strtmpl.cxx            |  4 +--
 sal/rtl/ustring.cxx            |  8 ++---
 sal/util/sal.map               |  5 +++
 12 files changed, 240 insertions(+), 9 deletions(-)

diff --git a/desktop/source/lib/init.cxx b/desktop/source/lib/init.cxx
index c4b922c75332..ae59c4b87d07 100644
--- a/desktop/source/lib/init.cxx
+++ b/desktop/source/lib/init.cxx
@@ -3564,6 +3564,9 @@ static int lo_initialize(LibreOfficeKit* pThis, const char* pAppPath, const char
     if (bInitialized)
         return 1;
 
+    if (eStage == PRE_INIT)
+        rtl_alloc_preInit(sal_True);
+
     if (eStage != SECOND_INIT)
         comphelper::LibreOfficeKit::setActive();
 
@@ -3715,6 +3718,10 @@ static int lo_initialize(LibreOfficeKit* pThis, const char* pAppPath, const char
         fprintf(stderr, "Bootstrapping exception '%s'\n",
                  OUStringToOString(exception.Message, RTL_TEXTENCODING_UTF8).getStr());
     }
+
+    if (eStage == PRE_INIT)
+        rtl_alloc_preInit(sal_False);
+
     return bInitialized;
 }
 
diff --git a/include/rtl/alloc.h b/include/rtl/alloc.h
index bd3149fece87..7257fe445c4a 100644
--- a/include/rtl/alloc.h
+++ b/include/rtl/alloc.h
@@ -286,6 +286,23 @@ SAL_DLLPUBLIC void SAL_CALL rtl_cache_free (
 ) SAL_THROW_EXTERN_C();
 
 
+#ifdef LIBO_INTERNAL_ONLY
+
+/** rtl_alloc_preInit
+ *
+ * This function, is called at the beginning and again
+ * at the end of LibreOfficeKit pre-initialization to enable
+ * various optimizations. It is almost certainly not the method
+ * that you want.
+ *
+ * @since LibreOffice 6.1
+ */
+SAL_DLLPUBLIC void SAL_CALL rtl_alloc_preInit (
+    sal_Bool start
+) SAL_THROW_EXTERN_C();
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/sal/qa/rtl/alloc/rtl_alloc.cxx b/sal/qa/rtl/alloc/rtl_alloc.cxx
index 30aadcc94f69..ac68af5bc4a5 100644
--- a/sal/qa/rtl/alloc/rtl_alloc.cxx
+++ b/sal/qa/rtl/alloc/rtl_alloc.cxx
@@ -18,11 +18,14 @@
  */
 
 #include <rtl/alloc.h>
+#include <rtl/ustrbuf.hxx>
 #include <sal/types.h>
 #include <cppunit/TestFixture.h>
 #include <cppunit/extensions/HelperMacros.h>
 #include <cppunit/plugin/TestPlugIn.h>
 
+#define SAL_STRING_STATIC_FLAG 0x40000000
+
 #include <memory.h>
 
 namespace rtl_alloc
@@ -132,8 +135,75 @@ public:
     CPPUNIT_TEST_SUITE_END();
 };
 
+class TestPreinit : public CppUnit::TestFixture
+{
+public:
+    TestPreinit()
+    {
+    }
+
+    // initialise your test code values here.
+    void setUp() override
+    {
+    }
+
+    void tearDown() override
+    {
+    }
+
+    // insert your test code here.
+
+    void test()
+    {
+        const char *sample = "Hello World";
+        std::vector<OUString> aStrings;
+
+        rtl_alloc_preInit(true);
+
+        OUString aFoo("foo");
+
+        // fill some cache bits
+        for (int iter = 0; iter < 4; iter++)
+        {
+            for (int i = 1; i < 4096; i += 8)
+            {
+                OUStringBuffer aBuf(i);
+                aBuf.appendAscii(sample, (i/8) % (sizeof(sample)-1));
+                OUString aStr = aBuf.makeStringAndClear();
+                aStrings.push_back(aStr);
+            }
+            // free some pieces to make holes
+            for (size_t i = iter; i < aStrings.size(); i += 17)
+                aStrings[i] = aFoo;
+        }
+
+        for (size_t i = 0; i < aStrings.size(); ++i)
+        {
+            CPPUNIT_ASSERT_MESSAGE( "not static before.", !(aStrings[i].pData->refCount & SAL_STRING_STATIC_FLAG) );
+        }
+
+        // should static-ize all the strings.
+        rtl_alloc_preInit(false);
+
+        for (size_t i = 0; i < aStrings.size(); ++i)
+            CPPUNIT_ASSERT_MESSAGE( "static after.", (aStrings[i].pData->refCount & SAL_STRING_STATIC_FLAG) );
+    }
+
+    void test2()
+    {
+        // should never happen but lets try it again.
+        test();
+    }
+
+    CPPUNIT_TEST_SUITE(TestPreinit);
+    CPPUNIT_TEST(test);
+    CPPUNIT_TEST(test2);
+    CPPUNIT_TEST_SUITE_END();
+};
+
 CPPUNIT_TEST_SUITE_REGISTRATION(rtl_alloc::Memory);
 CPPUNIT_TEST_SUITE_REGISTRATION(rtl_alloc::TestZeroMemory);
+CPPUNIT_TEST_SUITE_REGISTRATION(rtl_alloc::TestPreinit);
 } // namespace rtl_alloc
 
 CPPUNIT_PLUGIN_IMPLEMENT();
diff --git a/sal/rtl/alloc_arena.cxx b/sal/rtl/alloc_arena.cxx
index 8f9c64e0919e..4400528fa91d 100644
--- a/sal/rtl/alloc_arena.cxx
+++ b/sal/rtl/alloc_arena.cxx
@@ -17,6 +17,8 @@
  *   the License at http://www.apache.org/licenses/LICENSE-2.0 .
  */
 
+#include "sal/config.h"
+
 #include "alloc_arena.hxx"
 
 #include "alloc_impl.hxx"
@@ -664,6 +666,34 @@ rtl_arena_type * rtl_arena_activate(
     return arena;
 }
 
+extern "C" {
+void rtl_arena_foreach(rtl_arena_type *arena, ArenaForeachFn foreachFn, void *user_data) SAL_THROW_EXTERN_C()
+{
+    // quantum caches
+    if ((arena->m_qcache_max > 0) && (arena->m_qcache_ptr != nullptr))
+    {
+        int i, n = (arena->m_qcache_max >> arena->m_quantum_shift);
+        for (i = 1; i <= n; i++)
+        {
+            if (arena->m_qcache_ptr[i - 1] != nullptr)
+                rtl_cache_foreach (arena->m_qcache_ptr[i - 1],
+                                   foreachFn, user_data);
+        }
+    }
+
+    /* used segments */
+    for (int i = 0, n = arena->m_hash_size; i < n; i++)
+    {
+        for (rtl_arena_segment_type *segment = arena->m_hash_table[i];
+             segment != nullptr; segment = segment->m_fnext)
+        {
+            foreachFn(reinterpret_cast<void *>(segment->m_addr),
+                      segment->m_size, user_data);
+        }
+    }
+}
+} // extern "C"
+
 void rtl_arena_deactivate(rtl_arena_type * arena)
 {
     rtl_arena_segment_type * head, * segment;
diff --git a/sal/rtl/alloc_arena.hxx b/sal/rtl/alloc_arena.hxx
index 6a846876ff17..de1767ae226c 100644
--- a/sal/rtl/alloc_arena.hxx
+++ b/sal/rtl/alloc_arena.hxx
@@ -112,6 +112,14 @@ struct rtl_arena_st
  */
 extern rtl_arena_type * gp_default_arena;
 
+extern "C" {
+
+typedef void (*ArenaForeachFn)(void *addr, sal_Size size, void *user_data);
+void SAL_CALL rtl_arena_foreach(rtl_arena_type *arena, ArenaForeachFn fn, void *user_data) SAL_THROW_EXTERN_C();
+void SAL_CALL rtl_cache_foreach(rtl_cache_type *arena, ArenaForeachFn foreachFn, void *user_data) SAL_THROW_EXTERN_C();
+
+}
+
 #endif // INCLUDED_SAL_RTL_ALLOC_ARENA_HXX
 
 /* vim:set shiftwidth=4 softtabstop=4 expandtab: */
diff --git a/sal/rtl/alloc_cache.cxx b/sal/rtl/alloc_cache.cxx
index 091a5bf774ee..b419c30f3a1d 100644
--- a/sal/rtl/alloc_cache.cxx
+++ b/sal/rtl/alloc_cache.cxx
@@ -1153,6 +1153,26 @@ void SAL_CALL rtl_cache_free(
     }
 }
 
+// FIXME: foreachFn called for free'd blocks and will break free-chains.
+extern "C" {
+void SAL_CALL rtl_cache_foreach(rtl_cache_type *cache, ArenaForeachFn foreachFn, void *user_data) SAL_THROW_EXTERN_C()
+{
+    for (rtl_cache_slab_type *cur = &(cache->m_used_head);
+         cur && cur->m_slab_next != &(cache->m_used_head);
+         cur = cur->m_slab_next)
+    {
+        for (char *item = reinterpret_cast<char *>(cur->m_data);
+             item < reinterpret_cast<char *>(cur->m_bp);
+             item += cache->m_type_size)
+        {
+            foreachFn(item, cache->m_type_size, user_data);
+        }
+    }
+
+    RTL_MEMORY_LOCK_RELEASE(&(cache->m_slab_lock));
+}
+} // extern "C"
+
 #if defined(SAL_UNX)
 
 void SAL_CALL rtl_secureZeroMemory(void *Ptr, sal_Size Bytes) SAL_THROW_EXTERN_C()
diff --git a/sal/rtl/strimp.cxx b/sal/rtl/strimp.cxx
index db554644f6a9..39b08a4ff2c3 100644
--- a/sal/rtl/strimp.cxx
+++ b/sal/rtl/strimp.cxx
@@ -16,8 +16,15 @@
  *   except in compliance with the License. You may obtain a copy of
  *   the License at http://www.apache.org/licenses/LICENSE-2.0 .
  */
+#include "sal/config.h"
+
+#include <assert.h>
+#include <rtl/alloc.h>
+#include <rtl/ustring.h>
 
 #include "strimp.hxx"
+#include "alloc_impl.hxx"
+#include "alloc_arena.hxx"
 
 sal_Int16 rtl_ImplGetDigit( sal_Unicode ch, sal_Int16 nRadix )
 {
@@ -49,4 +56,65 @@ bool rtl_ImplIsWhitespace( sal_Unicode c )
     return false;
 }
 
+/*
+ * TODO: add a slower, more awful, but more space efficient
+ * custom allocator for the pre-init phase. Existing slab
+ * allocator's minimum alloc size is 24bytes, and by default
+ * is 32bits.
+ */
+static rtl_arena_type *pre_arena = nullptr;
+
+rtl_allocateStringFn rtl_allocateString = rtl_allocateMemory;
+rtl_freeStringFn rtl_freeString = rtl_freeMemory;
+
+extern "C" {
+static void *pre_allocateStringFn(sal_Size n)
+{
+    sal_Size size = RTL_MEMORY_ALIGN(n + 4, 4);
+    char *addr = static_cast<char*>(rtl_arena_alloc(pre_arena, &size));
+    assert(size>= 12);
+    reinterpret_cast<sal_uInt32*>(addr)[0] = size - 12;
+    return addr + 4;
+}
+
+static void pre_freeStringFn(void *data)
+{
+    char    *addr = static_cast<char*>(data) - 4;
+    sal_Size size = reinterpret_cast<sal_uInt32*>(addr)[0] + 12;
+
+    rtl_arena_free(pre_arena, addr, size);
+}
+
+static void mark_static(void *addr, sal_Size /* size */, void *)
+{
+    char *inner = static_cast<char*>(addr) + 4;
+    rtl_uString *str = reinterpret_cast<rtl_uString *>(inner);
+    str->refCount |= SAL_STRING_STATIC_FLAG;
+}
+} // extern "C"
+
+void SAL_CALL rtl_alloc_preInit (sal_Bool start) SAL_THROW_EXTERN_C()
+{
+    if (getenv("SAL_DISABLE_PREINIT"))
+        return;
+
+    if (start)
+    {
+        rtl_allocateString = pre_allocateStringFn;
+        rtl_freeString = pre_freeStringFn;
+        pre_arena = rtl_arena_create("pre-init strings", 4, 0,
+                                     nullptr, rtl_arena_alloc,
+                                     rtl_arena_free, 0);
+    }
+    else // back to normal
+    {
+        rtl_arena_foreach(pre_arena, mark_static, nullptr);
+        rtl_allocateString = rtl_allocateMemory;
+        rtl_freeString = rtl_freeMemory;
+
+        // TODO: also re-intialize main allocator as well.
+    }
+}
+
+
 /* vim:set shiftwidth=4 softtabstop=4 expandtab: */
diff --git a/sal/rtl/strimp.hxx b/sal/rtl/strimp.hxx
index c6325dedaa75..f08b233bb1c7 100644
--- a/sal/rtl/strimp.hxx
+++ b/sal/rtl/strimp.hxx
@@ -49,6 +49,12 @@ sal_Int16 rtl_ImplGetDigit( sal_Unicode ch, sal_Int16 nRadix );
 
 bool rtl_ImplIsWhitespace( sal_Unicode c );
 
+typedef void *(*rtl_allocateStringFn)(sal_Size size);
+typedef void  (*rtl_freeStringFn)(void *);
+
+extern rtl_allocateStringFn rtl_allocateString;
+extern rtl_freeStringFn rtl_freeString;
+
 // string lifetime instrumentation / diagnostics
 #if USE_SDT_PROBES
 #  define PROBE_SNAME(n,b) n ## _ ## b
diff --git a/sal/rtl/string.cxx b/sal/rtl/string.cxx
index 8dbcee763a95..d6f184a85b6c 100644
--- a/sal/rtl/string.cxx
+++ b/sal/rtl/string.cxx
@@ -274,7 +274,7 @@ bool SAL_CALL rtl_impl_convertUStringToString(rtl_String ** pTarget,
                                                    &nInfo, &nSrcChars );
             if (bCheckErrors && (nInfo & RTL_UNICODETOTEXT_INFO_ERROR) != 0)
             {
-                rtl_freeMemory(pTemp);
+                rtl_freeString(pTemp);
                 rtl_destroyUnicodeToTextConverter(hConverter);
                 return false;
             }
@@ -283,7 +283,7 @@ bool SAL_CALL rtl_impl_convertUStringToString(rtl_String ** pTarget,
                 break;
 
             /* Buffer not big enough, try again with enough space */
-            rtl_freeMemory( pTemp );
+            rtl_freeString( pTemp );
 
             /* Try with the max. count of characters with
                additional overhead for replacing functionality */
@@ -298,7 +298,7 @@ bool SAL_CALL rtl_impl_convertUStringToString(rtl_String ** pTarget,
             rtl_String* pTemp2 = rtl_string_ImplAlloc( nDestBytes );
             OSL_ASSERT(pTemp2 != nullptr);
             rtl_str_ImplCopy( pTemp2->buffer, pTemp->buffer, nDestBytes );
-            rtl_freeMemory( pTemp );
+            rtl_freeString( pTemp );
             pTemp = pTemp2;
         }
         else
diff --git a/sal/rtl/strtmpl.cxx b/sal/rtl/strtmpl.cxx
index 62bb76ddfdd1..5fdab82f8b43 100644
--- a/sal/rtl/strtmpl.cxx
+++ b/sal/rtl/strtmpl.cxx
@@ -1149,7 +1149,7 @@ static IMPL_RTL_STRINGDATA* IMPL_RTL_STRINGNAME( ImplAlloc )( sal_Int32 nLen )
         = (sal::static_int_cast< sal_uInt32 >(nLen)
            <= ((SAL_MAX_UINT32 - sizeof (IMPL_RTL_STRINGDATA))
                / sizeof (IMPL_RTL_STRCODE)))
-        ? static_cast<IMPL_RTL_STRINGDATA *>(rtl_allocateMemory(
+        ? static_cast<IMPL_RTL_STRINGDATA *>(rtl_allocateString(
             sizeof (IMPL_RTL_STRINGDATA) + nLen * sizeof (IMPL_RTL_STRCODE)))
         : nullptr;
     if (pData != nullptr) {
@@ -1230,7 +1230,7 @@ void SAL_CALL IMPL_RTL_STRINGNAME( release )( IMPL_RTL_STRINGDATA* pThis )
     if ( !osl_atomic_decrement( &(pThis->refCount) ) )
     {
         RTL_LOG_STRING_DELETE( pThis );
-        rtl_freeMemory( pThis );
+        rtl_freeString( pThis );
     }
 }
 
diff --git a/sal/rtl/ustring.cxx b/sal/rtl/ustring.cxx
index 167cdf2eab6d..31e0cf10d363 100644
--- a/sal/rtl/ustring.cxx
+++ b/sal/rtl/ustring.cxx
@@ -830,7 +830,7 @@ retry:
                code here. Could be the case for apple encodings */
             while ( nInfo & RTL_TEXTTOUNICODE_INFO_DESTBUFFERTOOSMALL )
             {
-                rtl_freeMemory( pTemp );
+                rtl_freeString( pTemp );
                 nNewLen += 8;
                 pTemp = rtl_uString_ImplAlloc( nNewLen );
                 if (pTemp == nullptr) {
@@ -859,7 +859,7 @@ retry:
             if (pTemp2 != nullptr)
             {
                 rtl_str_ImplCopy(pTemp2->buffer, pTemp->buffer, nDestChars);
-                rtl_freeMemory(pTemp);
+                rtl_freeString(pTemp);
                 pTemp = pTemp2;
             }
             else
@@ -926,7 +926,7 @@ static void rtl_ustring_intern_internal( rtl_uString ** newStr,
 
     if( can_return && *newStr != str )
     { /* we dupped, then found a match */
-        rtl_freeMemory( str );
+        rtl_freeString( str );
     }
 }
 
@@ -1077,7 +1077,7 @@ internRelease (rtl_uString *pThis)
         osl_releaseMutex( pPoolMutex );
     }
     if (pFree)
-        rtl_freeMemory (pFree);
+        rtl_freeString (pFree);
 }
 
 sal_uInt32 SAL_CALL rtl_uString_iterateCodePoints(
diff --git a/sal/util/sal.map b/sal/util/sal.map
index 579119065121..1dffaf8bc787 100644
--- a/sal/util/sal.map
+++ b/sal/util/sal.map
@@ -740,6 +740,11 @@ PRIVATE_1.4 { # LibreOffice 6.0
         _ZN3sal19backtrace_to_stringEPNS_14BacktraceStateE;
 } PRIVATE_1.3;
 
+PRIVATE_1.3 { # LibreOffice 6.1
+    global:
+        rtl_alloc_preInit;
+} PRIVATE_1.2;
+
 PRIVATE_textenc.1 { # LibreOffice 3.6
     global:
         _ZN3sal6detail7textenc20convertCharToUnicode*;
-- 
2.13.6

-- 
michael.meeks at collabora.com <><, Pseudo Engineer, itinerant idiot


More information about the LibreOffice mailing list