[systemd-devel] [RFC/PATCH] journal over the network

Tue Nov 20 09:31:36 PST 2012

On Tue, 20.11.12 03:35, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:

> > My intention was to speak only HTTP for all of this, so that we can
> > nicely work through firewalls.
> Yeah, probably that's more useful than raw stream for normal purposes,
> since it allows for authentication and whatnot.

Yeah, and not just that. I also want to beef up the server side so that
it optionally can run as CGI and as fastCGI, so that people can
integrate that into their existing web servers, if they wish.

But yeah, using HTTP solves many many issues, such as auth, encryption,
firewall/proxy support, and so on. On top of this the semantics of log
syncing fit really nicely into the GET/POST model of HTTP.

> > I think it would make sense to drop things into
> > /var/log/journal/<hostname>/*.journal by default. The hostname would
> > have to be determined from the URL the user specified on the command
> > line. Ideally we'd use the machine ID here, but since the machine ID is
> > hardly something the user should specify on the command line (and we
> > cannot just take the machine ID supplied form the other side, because we
> > probably should not trust that and hence allow it to tell us to
> > overwrite another hosts' data), the hostname is the next best
> > thing. Currently libsystemd-journald will ignore directories that are
> > not machine IDs when browsing, but we could easily drop that limitation.
> So it seems that this mapping (url/source/whatever -> .journal path)
> will require some thought.
> 
> I'd imagine, that people will want to use this most often as a syslogd
> replacement, i.e. launch systemd-journal-remote on a central host, and
> then let all other hosts stream messages live. In this case we know
> only two things: _MACHINE_ID specified remotely, and the remote
> IP:PORT and thus hostname. Actually, I thought that since all those
> things are "unreliable" (IP only to some extent, but still), they
> wouldn't be used to determine the output file, and all output would go
> into one .journal.

So, my thinking here is that hostnames generally suck for identifying
machines since they are not unique, can change and sometimes are not set
at all. However, that is only true in the general case. In the specific
case where admins want to set up an infrastructure for centralizing logs
they first set up a network, and as part of that I am pretty sure they
came up with a sane naming/addressing scheme first, that makes the name
unique in their local setup, makes the names fixed and ensures the name
is always there. Or to put this in other words: to be able to sync logs
from another hosts you first need to think about how you can contact
that other host, and hence had to introduce a naming scheme first, and
we should be able to just build on that.

> I remember that samba does (did?) something like what you suggest, and
> kept separate logs based on the information under control of the
> connecting host. On a host connected to the internet this would lead
> to hundreds of log files.
> 
> In addition, .journal files have a fairly big overhead: ~180kB for a
> an "empty" file. This overhead might be unwanted if there are many
> sources.
> 
> Maybe there's no one answer, and choices will have to be provided.

I think it definitely makes sense to allow admins to name the local
destination dir as they want. I am mostly just interested in finding a
good default, and I'd vote extracting the "basename" of the URL used to
access the remote journal for that.

> > > Push mode is not implemented... (but it would be a separate program
> > > anyway).
> > 
> > My intention was actually to keep this in the same tool. So that we'd
> > have for input and output:
> > 
> > A) HTTP GET
> > B) HTTP POST
> > C) SSH PULL (would invoke "journalctl -o export" via ssh)
> > D) SSH PUSH (would invoke systemd-journald-remote via ssh)
> > E) A directory for direct read access (which would allows us to merge multiplefile into one with this tool)
> > F) A directory for direct write access (which is of course the
> > default)
>
> Also useful:
> B1) socket listen() without HTTP

Where would I want to use that instead of B? 

> B2) HTTPS POST (I'm assuming that POST means to listen)

HTTPS for me is just a special case of HTTP. When I meant HTTP above I
meant HTTP with and without TLS, and with and without authentication.

> E1) a specific file for read access
> F1) a specific file for write access

That's something we have to think about anyway: i.e. whether we should
allow accessing a separate journal file via libsystemd-journal?
Currently we only allow accessing dirs. The reason for that is more or
less that accessing files probably doesn't do what people assume it
would do, since files are subject to rotation and referencing a file
hence quickly becomes a dangling reference...

> B1, F, F1 are implemented; A is implemented but ugly (curl).
> E and E1 would require pulling in journalctl functionality.
> 
> > We should always require that either E or F is used, but in any
> > combination with any of the others.
> I think it is useful to allow the output directory to be implicit
> (e.g. /var/log/journal/<hostname>/remote.journal can be used).

Yes, definitely.

> 
> > > Examples:
> > >   journalctl -o export | systemd-journal-remoted --stdin -o /tmp/dir/
> > 
> > Sounds pretty cool. Pretty close to what I'd have in mind.
> > 
> > To make this even shorter I'd suggest though that we take two normal
> > args for source and dest, and that "-" is used as stdin/stdout
> > respectively, and the dest can be ommited:
> 
> It started this way during development, but I'm not so sure if it'll
> be always clear what is meant:
> B, B1, and B2 can also come from socket activation, thus not appearing
> on

Well, but socket activation can easily be detected, and be treated
specially? I.e. if sd_listen_fds() returns > 0 we could always go into
activation mode?

> the command line, but output might still be specified.
> OTOH, there might be multiple sources, and the implicit output dir.

Multiple sources? What do you mean?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.