[systemd-devel] RFC: filter and search journalctl

Goffredo Baroncelli kreijack at libero.it
Sat Aug 8 12:13:42 PDT 2015


On 2015-08-07 11:53, Sebastian Schindler wrote:
> Hi all.
> 
> The journal format offers powerful filter capabilities. Unfortunately this power
> is lost, if you have to use grep to find certain information.
> Example given (unscientific benchmark), count the number of entries for a
> (known) executable:
> 
> 
>     $> journalctl --disk-usage
>     Archived and active journals take up 344.1M on disk.
> 
>     $> $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \
>          grep -F _EXE=/usr/sbin/dhclient | wc -l)
>     1233
> 
>     real    0m0.111s
>     user    0m0.007s
>     sys 0m0.091s
> 
> 
>     $> $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc -l)
>     1233
> 
>     real    0m7.515s
>     user    0m5.088s
>     sys 0m6.896s
> 
> 
> This shows that using grep-piping is magnitudes slower than journalctl.

This is due to the fact that the journal file is structured like a database; all fields are fully indexed, so journalctl is faster in case of a query like KEY=VALUE.

For other kind of search (by regexp, or only by value), journalctl cannot use the indexes so it is a lot slower because it has to process all the journal log.

I am curious  if you do 

$ time ( grep -F _EXE=/usr/sbin/dhclient /var/log/journal/*/*| wc -l)

which is the time resulting

> 
> Grep-ing seems to be the only solution to find log entries if you don't fully
> know what you're looking for. For example: You want to see all entries
> containing a certain MESSAGE that gets enriched with additional information
> during the logging process:
> 
> MESSAGE=host <HOST> has closed connection <CONNECTION_ID>
> 
> At the moment you have no option to look for this kind of information unless
> someone has set something like  MESSAGE_ID you can filter for. There are several
> use cases using this pattern of thinking:
> 
> * there's no option to show all set FIELD keys in the current journal, although
>   this information is encoded in the header of each journal file
> * there's no support for negated filtering, you can't easily hide output of a
>   certain unit which is creating too much noise
> * there's no support for regular expressions (except for the --unit option),
>   this is especially problematic when you're looking for certain MESSAGEs
> * there's no option to show all entries containing a certain field
> * logical expressions are somewhat hard to read/write because parenthesis can't
>   be used to enforce certain logical expressions
> 
> What do you think about a query language for journalctl that allows more
> powerful search options? This could be introduced without ignoring the
> capabilities the journal file format has to offer. Are there maybe already plans
> to introduce something alike into journalctl? Do some people here have
> experience with query languages for such a use case? Things come to mind like
> PCAP filter, SPARQL, Lucene or the SPHINX Query Language.
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


More information about the systemd-devel mailing list