[systemd-devel] [RFC/PATCH] journal over the network

Tue Nov 20 09:56:41 PST 2012

On Tue, Nov 20, 2012 at 06:31:36PM +0100, Lennart Poettering wrote:
> On Tue, 20.11.12 03:35, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:
> 
> > > My intention was to speak only HTTP for all of this, so that we can
> > > nicely work through firewalls.
> > Yeah, probably that's more useful than raw stream for normal purposes,
> > since it allows for authentication and whatnot.
> 
> Yeah, and not just that. I also want to beef up the server side so that
> it optionally can run as CGI and as fastCGI, so that people can
> integrate that into their existing web servers, if they wish.
> 
> But yeah, using HTTP solves many many issues, such as auth, encryption,
> firewall/proxy support, and so on. On top of this the semantics of log
> syncing fit really nicely into the GET/POST model of HTTP.
> 
> > > I think it would make sense to drop things into
> > > /var/log/journal/<hostname>/*.journal by default. The hostname would
> > > have to be determined from the URL the user specified on the command
> > > line. Ideally we'd use the machine ID here, but since the machine ID is
> > > hardly something the user should specify on the command line (and we
> > > cannot just take the machine ID supplied form the other side, because we
> > > probably should not trust that and hence allow it to tell us to
> > > overwrite another hosts' data), the hostname is the next best
> > > thing. Currently libsystemd-journald will ignore directories that are
> > > not machine IDs when browsing, but we could easily drop that limitation.
> > So it seems that this mapping (url/source/whatever -> .journal path)
> > will require some thought.
> > 
> > I'd imagine, that people will want to use this most often as a syslogd
> > replacement, i.e. launch systemd-journal-remote on a central host, and
> > then let all other hosts stream messages live. In this case we know
> > only two things: _MACHINE_ID specified remotely, and the remote
> > IP:PORT and thus hostname. Actually, I thought that since all those
> > things are "unreliable" (IP only to some extent, but still), they
> > wouldn't be used to determine the output file, and all output would go
> > into one .journal.
> 
> So, my thinking here is that hostnames generally suck for identifying
> machines since they are not unique, can change and sometimes are not set
> at all. However, that is only true in the general case. In the specific
> case where admins want to set up an infrastructure for centralizing logs
> they first set up a network, and as part of that I am pretty sure they
> came up with a sane naming/addressing scheme first, that makes the name
> unique in their local setup, makes the names fixed and ensures the name
> is always there. Or to put this in other words: to be able to sync logs
> from another hosts you first need to think about how you can contact
> that other host, and hence had to introduce a naming scheme first, and
> we should be able to just build on that.
Exactly. I was thinking about --trust-hostname=no|cert|always
as described in the other mail.

> > I remember that samba does (did?) something like what you suggest, and
> > kept separate logs based on the information under control of the
> > connecting host. On a host connected to the internet this would lead
> > to hundreds of log files.
> > 
> > In addition, .journal files have a fairly big overhead: ~180kB for a
> > an "empty" file. This overhead might be unwanted if there are many
> > sources.
> > 
> > Maybe there's no one answer, and choices will have to be provided.
> 
> I think it definitely makes sense to allow admins to name the local
> destination dir as they want. I am mostly just interested in finding a
> good default, and I'd vote extracting the "basename" of the URL used to
> access the remote journal for that.
> 
> > > > Push mode is not implemented... (but it would be a separate program
> > > > anyway).
> > > 
> > > My intention was actually to keep this in the same tool. So that we'd
> > > have for input and output:
> > > 
> > > A) HTTP GET
> > > B) HTTP POST
> > > C) SSH PULL (would invoke "journalctl -o export" via ssh)
> > > D) SSH PUSH (would invoke systemd-journald-remote via ssh)
> > > E) A directory for direct read access (which would allows us to merge multiplefile into one with this tool)
> > > F) A directory for direct write access (which is of course the
> > > default)
> >
> > Also useful:
> > B1) socket listen() without HTTP
> 
> Where would I want to use that instead of B? 
It's much easier to write a non-HTTP client. And it's a natural
extension of allowing it locally, through a pipe.

> > B2) HTTPS POST (I'm assuming that POST means to listen)
> 
> HTTPS for me is just a special case of HTTP. When I meant HTTP above I
> meant HTTP with and without TLS, and with and without authentication.
Yeah, but usually one listens for the one or the other. Ugrades
from HTTP to HTTPS don't work well.

> > E1) a specific file for read access
> > F1) a specific file for write access
> 
> That's something we have to think about anyway: i.e. whether we should
> allow accessing a separate journal file via libsystemd-journal?
> Currently we only allow accessing dirs. The reason for that is more or
> less that accessing files probably doesn't do what people assume it
> would do, since files are subject to rotation and referencing a file
> hence quickly becomes a dangling reference...
Reading - for debugging purposes and other special purposes.
Writing - for example when I want to transfer a journal file to somebody,
it is much easier with one file than with multiple files.

> > B1, F, F1 are implemented; A is implemented but ugly (curl).
> > E and E1 would require pulling in journalctl functionality.
> > 
> > > We should always require that either E or F is used, but in any
> > > combination with any of the others.
> > I think it is useful to allow the output directory to be implicit
> > (e.g. /var/log/journal/<hostname>/remote.journal can be used).
> 
> Yes, definitely.
> 
> > 
> > > > Examples:
> > > >   journalctl -o export | systemd-journal-remoted --stdin -o /tmp/dir/
> > > 
> > > Sounds pretty cool. Pretty close to what I'd have in mind.
> > > 
> > > To make this even shorter I'd suggest though that we take two normal
> > > args for source and dest, and that "-" is used as stdin/stdout
> > > respectively, and the dest can be ommited:
> > 
> > It started this way during development, but I'm not so sure if it'll
> > be always clear what is meant:
> > B, B1, and B2 can also come from socket activation, thus not appearing
> > on
> 
> Well, but socket activation can easily be detected, and be treated
> specially? I.e. if sd_listen_fds() returns > 0 we could always go into
> activation mode?
Yes, so for example, I want to start systemd-journal-remote with two
sockets from socket activation and write to /data/mylog. In my scheme
i'll say

ExecStart=.../systemd-journal-remote -o /data/mylog

> > the command line, but output might still be specified.
> > OTOH, there might be multiple sources, and the implicit output dir.
> 
> Multiple sources? What do you mean?
For example, pulling from three hosts:

systemd-journal-remote -o /data/mylog http://host1 http://host2 http://host3

Currently, this is more or less working, and I think that it is
worth supporting.

Zbyszek