[systemd-devel] PATCH: Integration with Python logging framework

Mon Oct 15 13:30:09 PDT 2012

On Sun, Oct 14, 2012 at 2:22 PM, David Strauss <david at davidstrauss.net> wrote:
> This support has been on my task list for a while. Thanks for taking a
> first stab at it. A cursory look at the patch looks promising, and
> I'll probably commit it after some basic testing in the next few days.

I've been pondering how developers would actually want to use custom
systemd fields and concluded that the approach I've taken with the
patch is too simplistic. I'd like to arrive at an API that makes it
easy to use journal, but is generic enough to be useful for other use
cases too. These ideas are still fairly raw, but I'm stumped for now.

----
1. Event data
One clear use case that's fairly easy to support is supplying custom
message-specific data, such as MESSAGE_ID. Or details of the event
that caused the log message; for example, a HTTP request log has
fields for client's IP address, request URL, User-Agent string, etc.
The extra= way of providing these works well enough, but there needs
to be a way to mark which fields are intended to go to systemd and
which not.

Another point is that many projects already use extra= to provide
detailed information with log messages. We can't expect them to rename
these fields from Pythonic names to systemd-ish all-caps fields. It's
also inconsistent looking at the other LogRecord attributes:
http://docs.python.org/library/logging#logrecord-attributes

So one approach is to provide a mapping with each logger. Since the
logger is usually a module-global object, it's well-placed to retain
this information. However, is adding attributes to logger objects too
hacky? Any alternatives?

# module context
import logging
log = logging.getLogger('module.name')
log.systemd_fields = {'ip': 'REQUEST_ADDR', 'userAgent': 'HTTP_USER_AGENT'}

# function context
def foo(request):
    log.info("Request from %s", request.ip,
             extra={'ip': request.ip, 'userAgent': request.get('User-Agent')})

And how to treat MESSAGE_ID, would it be implicitly allowed in any
logger? Do we name it "MESSAGE_ID" or "messageId" to be consistent
with other LogRecord fields?

----
2. Context data
It's important to note that most of the interesting fields come from
the caller context, not the place where the log messages are issued.
This is an area where systemd has a lot to offer: we can ask queries
like "what else happened during the processing of this request?" "What
did this user do prior to experiencing the crash?" Some context
information is already saved by systemd -- such as PID, systemd unit,
etc. Other context info can only be provided by the app itself.

Python offers LoggerAdapter, but sadly the context they provide is
only attached to one logger. Since they add fields to the LogRecord,
the above API already suffices for them.

However, we could make this more powerful and convenient by offering
context managers and storing the data in a thread-local context. For
example, web frameworks/servers could add something like this to their
request dispatcher (obviously simplified):

def dispatch_request(request):
    context = {
        'REQUEST_URI': request.uri,
        'REQUEST_ADDR': request.ip
    }
    with journal.journal_context(context):
        handler = find_handler(request.uri)
        handler(request)

Even better if this were integrated with the logging framework, though
I'm not certain how it would be implemented. Setting "systemd_fields"
on the root logger would override any changes made by other modules.
But this mapping needs transcend beyond just one logger to be
useful... Something like:

import logging
logging.root.systemd_fields = {'ip': 'REQUEST_ADDR', 'uri': 'REQUEST_URI'}

def dispatch_request(request):
    context = {
        'uri': request.uri,
        'ip': request.ip
    }
    with journal.log_context(context):
        handler = find_handler(request.uri)
        handler(request)

----
> I've also considered abusing
> the translation substitution support in logging to put the variable
> values in into separate fields (in addition to using the composed
> string as MESSAGE=), since (1) those values are identified by a key
> for the purpose of substitution and (2) those values are generally the
> same values you'd want to use for filtering.

I'm not sure this is a good idea, since it changes even on minor
changes to the message. In any case, I think it should be evaluated
independently of Python -- if it seems like a good idea to systemd
developers, it should be documented and perhaps adopted by other
packages too, not just Python.

Regards,
Marti