[Bug 49687] New event based LogStore

Thu May 10 12:16:09 CEST 2012

https://bugs.freedesktop.org/show_bug.cgi?id=49687

--- Comment #7 from Cosimo Alfarano <cosimo.alfarano at collabora.co.uk> 2012-05-10 03:16:09 PDT ---
(In reply to comment #6)
> One of the main reasons I thought of splitting the DB is to wrap the TBI
> (message DB) is to be able to fullindex the messages, with and FTS library such
> as Xapian or Lucene.

> Having the Events and the Messages in one DB just makes it not so clean in
> terms of separation of concern. What if we decide that Xapian is not so good of
> a FTS indexer. This would mean wrapping the whole DB all over again.

OK, can you expand this thought in a rationale on the wiki?
What are the requirements for having the FTS indexer working?
Actually, an FTS section mentioning Xapian or any other FTS indexer is missing.

What's the difference between having what you proposed (evend_id+body table)
and a table with more columns? Would a JOIN on two tables work?

> Anyhow we could create a dedicated integrated event DB. Which does its own 
> sync read/write for the Log. We could do that with a our own integration or 
> with the upcoming libzeitgeist2 which creates a DB for you without the daemon. 

I thought you already put a section about those possibilities on the wiki.
Please, add a section about it as well, so we have a complete scenario.

> In both cases if your worries are doing stuff async then swtiching to sync 
> writing should not be an issue.

I don't think that the async writing is what worries people, but what make them
(including me) critical.
As long as there is a weak point in the proposal, it's not really viable.

> > The bigger issue is that the second part (body) is the one most likely to be
> > lost and it's also the most important (or at least equally important with some
> > other event's info).
> 
> When can it get lost.

Lost = the daemon shuts down before the callback is fired (including ZG never
called us back).
How can it happen? Normal dbus service life cycle, desktop lifecycle, etc.
This part we can work on.

Last but not least TPL crashes: the longer it takes (in term of steps, rather
than time, I know ZG is fast) of storing the whole info, the higher the
possibility of data loss on a crash. This one we cannot do much, but we need to
make TPL arch less susceptible to inconsistencies on such situations as well.

> > I don't care if I don't remember the avatar used with the message or the
> > geolocation of the event.
> > I care if I don't have the body or I cannot associate the it with timestamp or
> > from/to.
> > 
> > Is there any way to invert the process?
> > 
> > 1- write the body with the minimum set of needed info into the Body Index (even
> > if duplicated in the Log later), assigning a primary key X
> > 2- write the Log, telling ZG that this event is related to X (or giving it our
> > own event_id).
> 
> Sadly you can't give Zeitgeist an event_id to an event.
> 
> Well a good solution is to have a temp_table which stores all the info as it is
> (strings) as soon as they arrive. When an interaction happens we will first
> dump it in the temp_table. Then we insert into the log then into the TBI. Once
> both insertions took place we remove from the temp_table. This way if TPL quits
> or crashes, or the Log is not reachable the middle of a process, the next time
> tpl start it will find the temp_table not-empty and try to empty it.

This is a similar approach to what we use for pending messages, I think Nicolas
was thinking of a similar thing on Comment #4

My idea is not considering the temp table temporary at all, but part of the
log.
You have already the data, why removing it?

This also would make TPL queries (the log_manger_get_FOO()) not asking two
places, but just one.

> You might
> ask why not keep the temp_table as our main storage. Well:
> 1) it is hard to do a FTS index around it.

I look forward to seeing it on the Wiki. Would it help to have two tables?
One for the body and one for the rest (timestamp, id).

> 2) it will have duplicate string entries for example the target string. Which
> can be costly and should be rather stored as an int.

I don't understand what you mean. Is it a write() problem?
We already write the data fully in the temp table.

It this is an issue, it can be avoided re-factoring into multiple tables

| tpl_id | event_id | body | (table 1)
| contact_id | contact_id_number (yeah, silly name) | (table 2)
| tpl_id | timestamp | contact_id_number | (table 3)

Table 2 is written when a new contact enters the log (this write() happens once
in the table lifetime for each contact who will eventually contact us).
An in memory cache (hashtable) can be used in the LogStore for the recently
contacted people, so to avoid continuous queries (read) to table 2 as well.

This way we have a body and then only integers to deal with, on the average
situation.

The real problem is how to deal with data loss :)

> > This is a scenario in which we have a private DB for what we need and delegate
> > to ZG all the extra data, rather to have the private DB to keep what ZG cannot
> > store/is better not store in ZG.
> 
> I can't follow. Can you elaborate?

It's just a considaration on how ZG and the private DB are used.

WRT my former idea (a) of 
1- writing the whole needed info into SQLite
2- push the event info to ZG

and your idea (b) of
1- push the event info to ZG
2- writing the info that ZG does not store into SQLite

a) is a way to have a local SQLite DB with the majority of the info we need,
and use ZG (from TPL) to get the rest of the info. If for any reason ZG is not
running, we still can work.
ZG has partially duplicated info, but it wouldn't be a bit issue in my opinion.

in b) ZG has a main role, and the private SQLite is there only because ZG
cannot do FTS for the moment.
Delegating the whole log to ZG would be OK, the fact that it cannot handle the
body index for the moment is what actually creating the problem (see callback),
fixing that is probably the ideal solution.
Although, we completely relay on ZG, which means that if ZG is down, we cannot
do it. Can it be an issue?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA Contact for the bug.
You are the assignee for the bug.