[Libreoffice] Assertions and Logging

Stephan Bergmann sbergman at redhat.com
Fri Nov 18 06:25:16 PST 2011


Hi all,

The current situation with assertions and logging of "interesting 
events" in the LO code base is not very satisfying.

For one, there are two sets of functionality that cater for the same 
needs.  One is osl/diagnose.h, the other is tools/debug.hxx.

For another, the distinction between true assertions (that assert 
invariants of the program code, and that indicate programming errors 
when not met) and the warning about unusual program states (that still 
need to be handled properly by the code, like the occurrence of 
malformed input) is not clear-cut with the existing macros.  Some code 
is careful to use OSL_ASSERT only for true assertions and OSL_TRACE for 
warnings about unusual program states, while other cdoee uses OSL_ASSERT 
for both cases.

The downside of that mixture is that the useful debugging technique of 
aborting upon detection of a violated invariant is not available.---If 
you make OSL_ASSERT (and OSL_ENSURE, OSL_FAIL, etc.) abort, it will 
abort far too often for mundane warnings for it to be useful.

And for a third, the combinatorial space of --enable-debug, 
--enable-dbgutil, OSL_DEBUG_LEVEL, DBG_UTIL, etc. is rather confusing. 
While --enable-debug (prepare the build for easy debugging, by letting 
the compiler emit extra debug information and disable optimizations, 
etc.) and --enable-dbgutil (enable additional checks in the code base) 
ought to be rather orthogonal, in some sense they ultimately converge on 
the same OSL_DEBUG_LEVEL macro, trying to control disparate needs with a 
mere three values, 0, 1, 2.  (--enable-debug and --enable-dbgutil both 
set OSL_DEBUG_LEVEL to 1; setting it to 2 requires passing a dbglevel=2 
switch to make; additionally, --enable-dbgutil defines the DBG_UTIL 
macro.)  Also, confusion keeps arising as to whether enabling various 
debugging mechanisms may render the code for which they have been 
enabled incompatible (i.e., whether they have to be enabled for the 
complete build, or can be enabled selectively).  The general idea is 
that --enable-debug does not cause incompatibility, while 
--enable-dbgutil potentially does (and even uses another $INPATH, 
removing the ".pro" extension).


That being the status quo, an analysis of what would actually be useful 
to have brings us to the following three points:

1  There is demand for "true" assertions (that identify errors in 
program logic and abort).  I suggest to use standard <assert.h> for 
them.  It is available everywhere (in every module, in C as well as 
C++), and no home-brown solution is needed.  It might be considered a 
disadvantage that assert does not allow to supply an additional error 
message.  However, as a failed assert promptly aborts program execution 
with an appropriate message (supplied by the compiler, containing file 
and line information), the missing message is not much of a 
nuisance---you will need to look at the specified program location right 
away anyway, to debug the problem.  (On Unix-like OSs, assert generally 
outputs to stderr; for Windows GUI applications, it outputs to a message 
box and allows attaching a debugger, see the MSDN documentation for 
"assert" at 
<http://msdn.microsoft.com/en-us/library/9sb57dw4%28v=vs.71%29.aspx>.)

Whether assertions are enabled (and abort when not met) is ontrolled via 
the standard NDEBUG macro.  It is already taken care of in our build 
environment, being left undefined (i.e., assertions are enabled) for 
--enable-dbgutil and for --enable-debug.  (In the old build system, it 
is only left undefined for --enable-dbgutil, and always defined---thus 
disabling assertions---otherwise.)  As enabling assertions should not 
affect compatibility, enabling them for booth --enable-dbgutil and 
--enable-debug appears appropriate.  (But we can leave the behaviour of 
the old build system as it is, given that it is going away, anyway.)

2  There is also demand for logging of "interesting events" during code 
execution.  For this, I propose new functionality below.  Whether or not 
logging is enabled should not affect compatibility.

The rationale for replacing the current mechanisms for both (1) and (2) 
with something else each, is to allow for a smooth transition.  Existing 
occurrences of OSL_ASSERT etc. can be inspected individually, replacing 
them either with assert (see above) or SAL_WARN (see below).  If we 
would instead keep OSL_ASSERT for one of those two cases, it would be 
unclear for an occurrences of OSL_ASSERT in the code whether it has 
already been visited and reclassified or not.

3  Furthermore, there is sometimes demand for additional, debug-only 
code.  In general, that would be safety-check additions that are 
considered too expensive to be included in production builds (e.g., code 
that iterates over a data structure to check invariants, or additional, 
redundant fields within data structures).  Enabling such additional code 
potentially affects compatibility.

Such additional code is currently controlled via OSL_DEBUG_LEVEL et al. 
  OSL_DEBUG_LEVEL==1 is generally used for additions that do not affect 
compatibility (as it is enabled by both --enable-debug and 
--enable-dbgutil).  OSL_DEBUG_LEVEL==2, DBG_UTIL (defined upon 
--enable-dbguitl) and privately invented defines in certain parts of the 
code (to be set manually by knowledgeable developers) are used for 
additions that affect compatibility or that are considered too specific 
for general inclusion.  Either because they are too expensive even for 
every --enable-dbgutil build, or because they produce excessive log 
information (for which case the below new log functionality offers a 
better solution).

This can probably be reduced to three cases:

#if OSL_DEBUG_LEVEL != 0  for additional code that does not cause 
incompatibilities (if there is still demand for such code; the new log 
functionality will remove the need for such code in many cases).  This 
effectively reduces OSL_DEBUG_LEVEL to a binary switch (so the make 
dbglevel=x feature can be removed; and OSL_DEBUG_LEVEL could potentially 
be renamed---but that would probably not be worth it).

#if defined DBG_UTIL  for additional code that causes incompatibilities.

#if defined MY_SPECIAL_DEBUG  (replaced with actual defines, varying 
across the different code modules) for those special cases where always 
enabling the additional code is deemed to expensive in general. 
(However, for those special cases where the additional code produces 
excess log information, see below.)


Which brings us to the new log functionality.

Attached is a patch against current (towards LO 3.5) master (actually 
two patches, an extra one for binfilter).  It introduces sal/log.h and 
makes osl/diagnose.h and tools/debug.h divert to it internally.  It also 
replaces a few example occurrences of the old osl/diagnoes.h and 
tools/debug.h macros with their new counterparts (and I encourage you to 
take a look at those replacements, to get a feel for what is possible 
with the new functionality).

For easy reference, I copy the relevant documentation from sal/log.h here:

>     SAL_INFO(char const * area, char const * format, ...),
>     SAL_INFO_IF(bool condition, char const * area, char const * format, ...),
>     SAL_WARN(char const * area, char const * format, ...), and
>     SAL_WARN_IF(bool condition, char const * area, char const * format, ...)
>     produce an info resp. warning log entry with a printf-style message.  The
>     given format argument and any following arguments must be so that that
>     sequence of arguments would be appropriate for a call to printf.
>
>     SAL_INFO_S(char const * area, expr),
>     SAL_INFO_IF_S(bool condition, char const * area, expr),
>     SAL_WARN_S(char const * area, expr), and
>     SAL_WARN_IF_S(bool condition, char const * area, expr) produce an info resp.
>     warning log entry with a message produced by piping items into a C++
>     std::ostringstream (and are only available in C++).  The given expr must be
>     so that the full expression "stream << expr" is valid, where stream is a
>     variable of type std::ostringstream.
>
>       SAL_INFO_S("foo", "string " << s << " of length " << n)
>
>     would be an example of such a call; if the given s is of type rtl::OUString,
>
>       #include "rtl/oustringostreaminserter.hxx"
>
>     would make sure that an appropriate operator << is available.
>
>     For the _IF variants, log output is only generated if the given condition is
>     true (in addition to the other conditions that have to be met).
>
>     For all these macros, the given area argument must be non-null and must
>     match the regular expression
>
>       <area> ::= <subarea>("."<subarea>)*
>
>     with
>
>       <subarea> ::= [0-9a-z]+
>
>     Whether these macros generate any log output is controlled in a two-stage
>     process.
>
>     First, at compile time the macro SAL_LOG_LEVEL controls whether these macros
>     expand to actual code, or to no-ops.  SAL_LOG_LEVEL must expand to an
>     integral value 0, 1, or 2.
>
>     If SAL_LOG_LEVEL is 0, neither the INFO nor the WARN macros produce code.
>     If SAL_LOG_LEVEL is 1, only the WARN macros produce code.  If SAL_LOG_LEVEL
>     is 2, both the INFO and the WARN macros produce code.
>
>     Second, at runtime the environment variable SAL_LOG further limits which
>     macro calls actually generate log output.  The environment varialbe SAL_LOG
>     must either be unset or must match the regular expression
>
>       <env> ::= <switch>*
>
>     with
>
>       <switch> ::= <sense><level><area>?
>       <sense> ::= "+"|"-"
>       <level> ::= "INFO"|"WARN"
>
>     If the environment variable is unset, "+WARN" is used instead (which results
>     in all warnings being output but no infos).  If the given value does not
>     match the regular expression, "+INFO+WARN" is used instead (which in turn
>     results in everything being output).
>
>     A given macro call's level (INFO or WARN) and area is matched against the
>     given switches as follows:  Only those switches for which the level matches
>     the given level and for which the area is a prefix (including both empty and
>     full prefixes) of the given area are considered.  Log output is generated if
>     and only if among the longest such switches (if any), there is at least one
>     that has a sense of "+".  (That is, if both +WARN.foo and -WARN.foo are
>     present, +WARN.foo wins.)
>
>     For example, if SAL_LOG is "+INFO-INFO.foo+INFO.foo.bar", then calls like
>     SAL_INFO("foo.bar", ...), SAL_INFO("foo.bar.baz", ...), or
>     SAL_INFO("other", ...) generate output, while calls like
>     SAL_INFO("foo", ...) or SAL_INFO("foo.barzzz", ...) do not.
>
>     The generated log output consists of the given level ("info" or "warn"), the
>     given area, the process ID, the thread ID, the source file, and the source
>     line number, each followed by a colon, followed by a space, the given
>     message, and a newline.  The given message should contain no vertical
>     formatting characters and no null characters.  The precise format of the log
>     output is subject to change.  The log output is printed to stderr.

Some further points:

- Whether or not both SAL_INFO and SAL_WARN produce actual code is 
currently effectively controlled via OSL_DEBUG_LEVEL:  With either 
--enable-dbgutil or --enable-debug, the full log functionality is 
enabled.  Whether or not a more fine-grained control (producing code for 
only SAL_WARN) is necessary needs to be seen.  (If not, SAL_LOG_LEVEL 
can be folded into OSL_DEBUG_LEVEL.)  Having SAL_INFO compiled into each 
--enable-dbgutil build makes it possible to selectively enable 
interesting log information via SAL_LOG=+INFO.foo without having to 
recompile code.

- Assigning area codes to the macro invocations, I have mostly stuck to 
single-segment area codes that match the names of the code modules 
("sal", "binfilter", "sfx2").  For the replacements within 
osl/diangose.h and tools/debug.hxx (that will in turn be called from 
various places, so it would not make much sense to attribute them to 
"sal" or "tools", respectively) I used "legacy.osl" and "legacy.tools", 
respectively.  For a few cases where old code was OSL_DEBUG_LEVEL>=2 
conditional, I used two-segment area codes "canvas.level2", "jfw.level2" 
(and "jfw.level1"), "sw.level2".

- Replacing all existing calls to deprecated osl/diagnose.h and 
tools/debug.hxx macros will need to happen over time (an easy hack).

- There is further functionality in tools/debug.hxx (DBG_MEMTEST, 
DBG_CTOR, etc.) that has not yet been addressed.

- The recently introduced OSL_FORMAT is superseded by SAL_STREAM (which 
uses C++-stream-style composition instead of C-printf-format-style). 
OSL_FORMAT and all its uses have been removed.

- The implementation is somewhat careful to produce as little code as 
possible at the call-site of the new macros, and to keep the code path 
for suppressed logs as short as possible.  However, the C++-stream-style 
macros will potentially always be more expensive than the 
C-printf-format-style ones.  But the former are more useful, so I would 
encourage you to nevertheless use them where appropriate.

- unotest/oustringostreaminserter.hxx is moved to 
rtl/oustringostreaminserter.hxx (and duplications of it below unotest 
are removed); it is often needed in combination with sal/log.h.

- In a few places, the patches replace occurrences of tools String with 
rtl::OUString, so that streaming works.  Otherwise, it is necessary to 
wrap the tools String in an OUString ctor.

- sal/log.h contains a TODO to enable a GCC __attribute__((format)) on 
the C funtion underneath the C-printf-format-style macros.  Enabling it 
would produce tons of -Werror=format from old OSL_TRACE uses.  These 
need to be cleaned up first (an easy hack).

One open question is which set of macros (the C-printf-format-style ones 
or the C++-stream-style ones) to give the "natural" names without 
additional suffix.  For now, I reserve the natural names for the C-style 
ones, as they are more universal (not only available in C++) and produce 
smaller call-side code, especially for the simple, common case where the 
message consists of just a string literal, as in SAL_INFO("foo", 
"message").  However, reserving the natural names for the C++-style 
macros (and giving the other ones an _F suffix, say) would be justified 
by their better usability (and thus more frequent usage) and the fact 
that they avoid the pitfall of having to escape any literal "%" within 
the format string of a C-style macro usage---although enabling GCC's 
__attribute__((format)) check would also help detect misuses like 
SAL_INFO("foo", "100%").

If there is no objection, I will push the patches to master next week. 
Due to removing OSL_FORMAT and osl_detail_formatString again, I would 
prefer to get this still into LO 3.5.

Stephan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.patch
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20111118/2f0902d0/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-binfilter.patch
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20111118/2f0902d0/attachment-0001.asc>


More information about the LibreOffice mailing list