[Telepathy] thinking about a new log format for telepathy-logger
Matt Rogers
mattr at kde.org
Sun Mar 14 18:22:42 PDT 2010
On Sunday 14 March 2010 08:10:11 pm Danielle Madeley wrote:
> So Telepathy Logger currently logs things in something not unlike
> Empathy's XML log format. Although XML has some advantages (like being
> able to generate logs using XSL), it seems fairly sub-optimal for the
> efficient storage of logs.
>
How do you come to this conclusion? I understand that XML isn't really all
that compact in terms of bytes used. Is that where the problem lies?
> I've been thinking (and playing) with some ideas for how to replace
> this, and am looking for feedback/ideas.
>
> Solution #1: serialising binary structs
>
> This is where we serialise a fixed struct directly into the
> file. Something like:
>
> [danni at adelie log-backend]$ xxd test
> 0000000: 0000 0060 0000 0000 4b9d 7725 0000 0003 ...`....K.w
> %....
> 0000010: 0000 0020 6461 6e69 656c 6c65 2e6d 6164 ...
> danielle.mad
> 0000020: 656c 6579 4063 6f6c 6c61 626f 7261 2e63
> eley at collabora.c
>
> This message is formatted (guint32 entry-length)(gint64
> timestamp)(guint32 flags)((guint32 string-length)(string id)).
>
> Not really a fan of doing this, because it makes it quite
> difficult to extend later on (especially to non-text messages).
>
urgh. Please no. It's so 1980's. I would hate to be in the shoes of having to
write a conversion routine for when the struct that gets written is changed.
> Solution #2: binary tag-based thing (similar to Apple DMAP)
>
> This is also a binary format, but a tag based one where there is
> a table of tags, where each tag has a length and a data type
> (that type can be Container). Unpacked it might look something
> like this:
>
> (Format TAG length ..data..)
>
> LOGM 139
> NAME 16 "Danielle Madeley"
> IDXX 32 "danielle.madeley at collabora.co.uk"
> TIME 8 1268610853
> FLAG 4 0x3
> MESG 39 "This is a message that is 39 bytes long"
>
> Types:
>
> LOGM container "Log Message"
> NAME string "Name"
> IDXX string "Id"
> TIME gint64 "Timestamp"
> FLAG guint32 "Flags"
> MESG string "Message"
>
> Packs something like this (exciting intermix of ascii and hex):
>
> LO GM 008B NA ME 0010 Da ni el le _M ad el ey ID XX 0020 da ni
> el le .m ad el ey @c ol la bo ra .c o. uk TI ME 0008 0000 0000
> 4B9D 7725
>
> I kind of like this format because it's compact and extensible.
> It's quick to jump from container to container. However it's
> still very custom.
>
Does custom matter if it's the best solution for the job? This is a pretty
nice format.
> Solution #3: EXI or similar
>
> Basically use some binary XML format. This is more or less a
> formalisation of the system proposed above. The question is
> which format? They all seem incredibly overengineered for our
> purposes.
>
> Solution #4: sqlite
>
> Store each message as an SQLite row. Great for searching,
> probably won't scale?
>
> Thoughts?
>
> --danielle
Is XML really a problem if we could put something like Tracker or Nepomuk with
it and then let those various semantic desktop services index and search logs?
Granted you still have the inefficent storage problem (in terms of bytes
used), but I'm not sure if that's really an issue.
--
Matt
More information about the telepathy
mailing list