[systemd-devel] Client logging to journald without libsystemd-journal.so

Fri Nov 9 16:41:17 PST 2012

On Thu, 08.11.12 16:59, Daniel P. Berrange (berrange at redhat.com) wrote:

> I recently introduced support for libvirt logging to journald. Initially I
> had intended to use libsystemd-journal.so for the logging, however, in the
> end I made libvirt directly communicate with sendmsg().
> 
> First, I wanted to confirm two interface stability issues.
> 
>  - Is the client app -> journald logging protocol considered to be
>    ABI stable ?

No. But I am fine with saying that it now is. The serialation format is
actually just the export format:

http://www.freedesktop.org/wiki/Software/systemd/export

And that is already well-documented and stable. So turning this format
into a protocol is little more than just saying: "Well, and if you now
stick such a record into a AF_UNIX datagram and send it to
/run/systemd/journal/socket then you are speaking the local protocol.

The export format should be flexible enough to extend things if we need
to later, by using double-underscore fields, so just saying the protocol
at this point is stable is totally OK.

>  - Is the /run/systemd/journal/socket path considered to be stable ?

Same story.

Short: from now they are stable. I will add them to the interface
stability chart

> Second, I wanted to mention why we couldn't use libsystemd-journal.so
> ourselves.
> 
> The first problem is that there is no sd_journal_open/close API call
> to setup the file descriptor. The library uses a one time atomic
> global initialize to open its file descriptor which is then cached
> until exit() or execve() (it has SOCK_CLOEXEC set).

Sounds like a good addition. Added to the TODO list.

> The problem is that when libvirt does fork() to create client processes,
> one of the things it does is to iterate from 0 -> sysconf(_SC_OPEN_MAX),
> closing every file descriptor, except those in its whitelist.

My recommendation would be to iterate through /proc/self/fd/ instead,
and using RLIMIT_NOFILE as a fallback if that doesn't work. It's
actually measureably faster. (Especially if you end up running your code
in valgrind from time to time...).

We use this call for this:

http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/util.c#n1888

> Now I know there is the school of thought that says this is a bad
> idea,

I don't belong to that school. I think closing the fds in a loop is the
right thing to do.

> There are two things libsystemd-journal could do to help apps in this
> scenario. Either provide a way for apps to query the cached journal
> logging file descriptor, allowing them to explicitly leave it open.
> Alternatively provide explicit API to call to re-open the FD, which
> they could call after fork(). Possibly other solutions too, like
> requiring an explicit close/open like syslog though that has its own
> set of problems.

I'll probably add an API for closing and reopening the fd for cases like
this. That sounds the best to me. I am a bit afraid of handing out the
journal fd, but then again I often wished the glibc syslog() would allow
me to get the fd, since i wanted to use SO_SNDTIMEO on it. So I might ad
that too.
> 
> The second blocker problem was figuring out a way to send log messages
> using only APIs declared async-signal safe. Again this is so that we
> can safely send log messages inbetween fork() and execve() which only
> permits async signal safe APIs. The sd_journal_send() API can't be
> used since it relies on vasprintf() which can allocate using malloc.

Hmm, and that one is hard to fix. I tried hard to come up with a way
that would avoid malloc while still being nice to use.

> The sd_journal_sendv() API is pretty close to what we'd want, but
> the way you have to format the iovec doesn't quite work. IIUC, it
> requires that each iovec contains a single formatted log item
> string "KEY=VALUE". Populating data in such a way is inconvenient
> for libvirt. For libvirt it was easier for us to use two iovec
> elements for each log item, "KEY=" and "VALUE", so that we can
> avoid doing the data copy implied by filling a single string with
> "KEY=VALUE".

Hmm, this is an issue indeed. Mayb this might be a good reason to allow
people to get the low-level fd so that they can call sendmsg() directly
on it.

> As long as the wire format and UNIX socket path are considered ABI
> stable by systemd devs, I'm fairly happy with the libvirt code as
> it. I just mention these issues in case you think it is desirable
> to add further libsystemd-journal.so APIs to make life easier for
> other applications doing logging in the future.

There's two thing I want to add to libsystemd-journal that might make it
more interesting to use again: and that is an async way where we do not
block if the journal socket is full but do count how many messages we
drop so that we can log that later on... But that isn't implemented
yet. And transparent fallback to classic SysV logging, so that code can
link directly to libsystemd-journal as only log library without breaking
compat for non-systemd systems.

So humm, maybe we should add sd_journal_send2() which takes iovec-pairs
plus adding a sane way to close/reopen the journal fd is the way to go
and might make it interesting for you to use?

Anyway, added all of this to the TODO list. Thanks for the
suggestions. Much appreciated!

Lennart

-- 
Lennart Poettering - Red Hat, Inc.