[Telepathy] new jingle engine for Gabble - update

Tue Oct 21 17:06:24 PDT 2008

Hi,

today I've pushed some new patches for the new Jingle engine
for gabble which should make the new engine work in
most situations (there are still some known regressions but
it should be quite usable now), and updated it to gabble
HEAD.

The branch is at:
http://git.collabora.co.uk/?p=user/ptlo/gabble-senko.git;a=shortlog;h=refs/heads/master

Monkeyized:
   http://monkey.collabora.co.uk/gabble-senko.git_master/

What's working:
   * tests (yay! :-)
   * calls to and from current gabble
   * calls to and from gtalk pc client (v3 and v4)
   * calls to and from clients supporting recent
     Jingle draft (v0.26 atm since the namespaces changed
     again, but I plan to update that to the newest
     ns' before the code is merged and released)
   * content addition/removal

What's not working completely yet:
   * stream directions
   * stream holding

Now we need to test it as much as possible. If you're feeling
adventurous, please try the new code and see how it works
for you, and if you encounter any problems, please send
reports/logs.

Next step is finding and fixing regressions, and writing
more twisted tests (see below), and of course, adding new
features that were the point of all this in the first place:
   - ice and rawudp transport support
   - jingle used for purposes beyond just calls (ft, tubes, ...)

Will Thompson made a partial review and did some code cleanup.
His comments are below:

 > Design:
 > The design generally seems sound to me. We cannot enforce a 1–1
 > relationship between Jingle sessions and Telepathy channels. while it
 > seems sane for clients only to use a given Jingle session for one kind
 > of media, you might want to send 3 files to 1 person, and there's no
 > obvious reason not to reuse the same Jingle session for all three
 > files being sent in parallel. So in this case the single Jingle
 > session will map to three Telepathy channels, one for each file
 > transfer.
 >
 > So, for a/v calls we have one JingleSession mapping to one channel;
 > for file transfer we have n JIngleContents.
 >
 > We need to decide which component is responsible for rejecting
 > contents of the wrong “kind” (a/v, file, tubes, etc.) from the
 > session. I suppose that JingleSession might be the place to do it,
 > but if it guesses wrong we might reject all the kinds we understand
 > in favour of a kind we don't.
 >
 > (Also, Rob mentioned that it'd be worth mailing the relevant xmpp
 > list proposing a xep defining which contents a well-behaved client
 > should consider to be the same “kind”.)

Thinking about this, actually there might be a sane way (and reason)
to reuse one jingle session accross multiple "kinds of content".
Use case: you have a/v call with someone, and decide to send them
a file and open a tube. You already have a session established;
if possible (ie. if that peer resource has the right capabilities),
it's perfectly sane from Jingle POV to just add new contents to
that session. In that approach, we wouldn't need to worry about
detecting the compatible content types - JingleFactory would invoke
(Media|File|Tube)Factory separately on each new content "kind".

OTOH it is unclear how we would go about representing and implementing
this in the Telepathy land. According to the current spec,
a session is comprised of synchronised streams ("An media session
handler is an object that handles a number of synchronised media 
streams."). From this, I'd say that there's 1:1 correspondence
between (Media|File|Tube|...)Session (the DBus object implementing
the right interfaces) and the corresponding Channel. For MediaChannel,
we want audio and video stream to be in one MediaSession; for
FileTransferChannel we would presumably want independend Session
objects.

That's why I believe it's ok to have MediaChannel currently implement
the SessionHandler interface, thus having 1:1 correspondence, without
mandating (or even mentioning) this in the spec.

OTOH, if we do want to have separate objects, there's no problem in
decoupling them again. I've removed separate MediaSession because
after all the jingle session handling was factored out, only the
SessionHandler logic remained, so I decided it was pointless to
have an extra object around just for that.

Regarding Jingle spec clearing about related contents, I think we
should first figure out if we want one big Jingle session carrying
everything between the two peers (the hypotetical situation I've
described above, call+ft+tube), or multiple sessions, each being
responsible for one kind of contents.

[clarification: by "kind of contents" i mean group of related
content types]

 > I slightly expected the implementation of each dialect to be in
 > its own class, but I could be convinced that that's not necessary.

The differences between dialects are very small. If you don't count
gtalk and jingle having different stanza and having different synonyms,
the differences are quite small, but happen all over the place. With
C/GObject being quite verbose, I feel that we'd have a lot more
code duplication if we had each dialect in its own class, than having
well-placed (and documented) checks in the common class.

 > I'm a bit uncomfortable with the existing tests being reordered
 > and patched in ways I don't quite get.

There is a difference between current gabble and new code in when
the session is created. Current gabble always creates the session as
soon as the channel is created. New code creates session immediately
if it's an incoming one, but for outgoing calls, the session is
created only after the first stream is requested (this is to do
with code organisation - peer can be added to the channel in
several ways (when requesting it, manual add, or just request
streams), but always RequestStreams must be called, so it's convenient
to have the call to session create on only one place).

However, all the tests expect SessionHandler signal to be emitted befor
the RequestStreams call.

While the code can be rearranged to mimick the previous behavior,
I think the new behaviour is correct (and a bit more logical, esp. if
you don't buy into 1:1 channel-session correspondence - session is
created when it's used). So I think the tests are in for some fixing
(unfortunately we're still a bit limited in expressing either-or
expectations in tests, to completely avoid this issue).

Regarding testing, Rob mentioned that we should improve test coverage
for jingle. This is especially true now with the new code that's
bound to have (hopefully as small as possible) regressions.

Currently we have basic tests for:
   - outgoing call
   - incoming call (+ rejected)
   - several different ways of making the call
   - basic content addition and removal
   - a/v hold (only the telepathy side)

We could additionally have:
    - test for each dialect we support, for incoming and outgoing calls
    - test for gtalk4 -> gtalk3 fallback ala gtalk pc client
    - more stream add/removal scenarios (early media, race conditions,
      reject/remove/modify, etc)
    - setting stream direction
    - session/content timeouts and other possible errors

 > gabble_jingle_session_parse() serves to validate and to handle
 > Jingle <iq>s, so gets called twice for each, which is a bit weird.
 > It's also a bit weird that it appears to be a method on a
 > JingleSession but in fact that parameter can be NULL to signal that
 > it's in validation mode.

My original idea was to parse and validate all XML and put it into
some intermediate form (a bunch of structs). We decided against it,
but wanted to avoid optimistic JingleSession object creation, just
to destroy it immediately if the session wasn't parsed. To avoid
duplicating dry-run/for-real code, I've decided to have a switch
for it, and sid being (!)NULL seemed appropriate. I also wanted
to keep all XML parsing out of JingleFactory (even sid extraction
and detecting if it's jingle stanza is dependent on the jingle
dialect).

Now when I look at that code, since we don't emit new-session
signal until all went well, we can safely create a new session,
try to parse it, and silently destroy it if we're not happy
with the result,

 > So, I think jingle_cb and this function should be reworked as
 > follows:

 > * Is it a "set"? does have a jingle stanza of some description? if
 >   not, ALLOW_MORE_HANDLERS
 > * Are its attributes valid? if not, raise an error.
 > * Does a GabbleJingleSession matching sid already exist?
 >    * If so, sanity-check that its peer matches the other party in
 >      this iq
 >    * If not, create it, and set all the attributes on it
 > * Badger the state machine appropriately, or raise an error if the
 > transition is illegal.
 >
 > The current code does the first two steps twice. :)

Checking whether it's jingle stanza, determining session
id, session action and dialect is all intertwined (there's
a couple of ifs that handle it). We can factor this into
a separate function (gabble_jingle_session_detect?), which
would return (sid, action, dialect) or error to jingle_cb.
The second phase would be gabble_jingle_session_parse
which would get these parameters fed, so it wouldn't have
to extract them again).

 > JingleMediaRtp:
 > If someone sends me a malformed stanza, they can crash Gabble?
(regarding g_assert_not_reached () in xmlns check)

No. JingleMediaRtp gets instantiated only for xmlns' it
was subscribed for, and is expected to deal with all of them
(see jingle_rtp_media_register). So if that branch is reached, that
means either it registered incorrectly, or there's a bug in
jingle factory or jingle media rtp code.

Regards,
Senko

-- 
Senko Rasic at Collabora <senko.rasic at collabora.co.uk>