[Telepathy] new jingle engine for Gabble - update
robert.mcqueen at collabora.co.uk
Mon Oct 27 18:03:33 PDT 2008
Peter Saint-Andre wrote:
> About one session vs. many, we've had many discussions about that at
> various XMPP meetings, but we've never reached any consensus. Maybe Rob
> has more ideas on this topic.
:) What I said to Will which he hinted at somewhere in the original
mists of this thread is that I don't think it's useful for clients to
spend their time writing code which can deal with single sessions which
are able to be a video call /and/ a file transfer /and/ a whiteboarding
session. What would the UI be for managing such a multi-headed beast?
Given that each <content> in a <jingle> session has its own transport,
there's not usually any performance benefit to putting unrelated content
types into the same <jingle> session. It just makes something meaningful
(the files in this session should be presented at once, or the
audio/video streams in this session should be syncronised) into
something meaningless, and harder to implement.
My gut feeling is therefore we should RECOMMEND against it, and allow
clients to reject sessions which combine more than one content type, or
reject attempts to add content of a different type to an existing
session, unless in situations where we've defined it as explicitly
useful. Now that audio and video are the same RTP content type, that's
not a problem.
The one exception I can think of are my GTalk-inspired file-transfer
ideas. In that case we'd like to have one pseudo-TCP <description> over
an ICE <transport>, and multiple HTTP file <description>s over the
pseudo-TCP <transport> within the same session. But we can burn that
bridge when (if) we get to it.
The analogy in my mind with existing XMPP stuff is the message stanza
profile XEP-0226. Even though it's theoretically correct to put
arbitrarily many extended child attributes in a <message> stanza, it's
not actually useful (or desirable) for clients to be expected to deal
with this. A <message> stanza which is simultaneously an invite, a
feature negotiation, typing notification, HTML and plain text, geoloc
etc is just crazy. Some XEPs are always allowed (AMP, if a server ever
supports it :P), and others related to IMs are grouped, but most other
exciting things should have their own <message>.
> Senko Rasic wrote:
>> today I've pushed some new patches for the new Jingle engine
>> for gabble which should make the new engine work in
>> most situations (there are still some known regressions but
>> it should be quite usable now), and updated it to gabble
>> The branch is at:
>> What's working:
>> * tests (yay! :-)
>> * calls to and from current gabble
>> * calls to and from gtalk pc client (v3 and v4)
>> * calls to and from clients supporting recent
>> Jingle draft (v0.26 atm since the namespaces changed
>> again, but I plan to update that to the newest
>> ns' before the code is merged and released)
>> * content addition/removal
>> What's not working completely yet:
>> * stream directions
>> * stream holding
>> Now we need to test it as much as possible. If you're feeling
>> adventurous, please try the new code and see how it works
>> for you, and if you encounter any problems, please send
>> Next step is finding and fixing regressions, and writing
>> more twisted tests (see below), and of course, adding new
>> features that were the point of all this in the first place:
>> - ice and rawudp transport support
>> - jingle used for purposes beyond just calls (ft, tubes, ...)
>> Will Thompson made a partial review and did some code cleanup.
>> His comments are below:
>> > Design:
>> > The design generally seems sound to me. We cannot enforce a 1–1
>> > relationship between Jingle sessions and Telepathy channels. while it
>> > seems sane for clients only to use a given Jingle session for one kind
>> > of media, you might want to send 3 files to 1 person, and there's no
>> > obvious reason not to reuse the same Jingle session for all three
>> > files being sent in parallel. So in this case the single Jingle
>> > session will map to three Telepathy channels, one for each file
>> > transfer.
>> > So, for a/v calls we have one JingleSession mapping to one channel;
>> > for file transfer we have n JIngleContents.
>> > We need to decide which component is responsible for rejecting
>> > contents of the wrong “kind” (a/v, file, tubes, etc.) from the
>> > session. I suppose that JingleSession might be the place to do it,
>> > but if it guesses wrong we might reject all the kinds we understand
>> > in favour of a kind we don't.
>> > (Also, Rob mentioned that it'd be worth mailing the relevant xmpp
>> > list proposing a xep defining which contents a well-behaved client
>> > should consider to be the same “kind”.)
>> Thinking about this, actually there might be a sane way (and reason)
>> to reuse one jingle session accross multiple "kinds of content".
>> Use case: you have a/v call with someone, and decide to send them
>> a file and open a tube. You already have a session established;
>> if possible (ie. if that peer resource has the right capabilities),
>> it's perfectly sane from Jingle POV to just add new contents to
>> that session. In that approach, we wouldn't need to worry about
>> detecting the compatible content types - JingleFactory would invoke
>> (Media|File|Tube)Factory separately on each new content "kind".
>> OTOH it is unclear how we would go about representing and implementing
>> this in the Telepathy land. According to the current spec,
>> a session is comprised of synchronised streams ("An media session
>> handler is an object that handles a number of synchronised media
>> streams."). From this, I'd say that there's 1:1 correspondence
>> between (Media|File|Tube|...)Session (the DBus object implementing
>> the right interfaces) and the corresponding Channel. For MediaChannel,
>> we want audio and video stream to be in one MediaSession; for
>> FileTransferChannel we would presumably want independend Session
>> That's why I believe it's ok to have MediaChannel currently implement
>> the SessionHandler interface, thus having 1:1 correspondence, without
>> mandating (or even mentioning) this in the spec.
>> OTOH, if we do want to have separate objects, there's no problem in
>> decoupling them again. I've removed separate MediaSession because
>> after all the jingle session handling was factored out, only the
>> SessionHandler logic remained, so I decided it was pointless to
>> have an extra object around just for that.
>> Regarding Jingle spec clearing about related contents, I think we
>> should first figure out if we want one big Jingle session carrying
>> everything between the two peers (the hypotetical situation I've
>> described above, call+ft+tube), or multiple sessions, each being
>> responsible for one kind of contents.
>> [clarification: by "kind of contents" i mean group of related
>> content types]
>> > I slightly expected the implementation of each dialect to be in
>> > its own class, but I could be convinced that that's not necessary.
>> The differences between dialects are very small. If you don't count
>> gtalk and jingle having different stanza and having different synonyms,
>> the differences are quite small, but happen all over the place. With
>> C/GObject being quite verbose, I feel that we'd have a lot more
>> code duplication if we had each dialect in its own class, than having
>> well-placed (and documented) checks in the common class.
>> > I'm a bit uncomfortable with the existing tests being reordered
>> > and patched in ways I don't quite get.
>> There is a difference between current gabble and new code in when
>> the session is created. Current gabble always creates the session as
>> soon as the channel is created. New code creates session immediately
>> if it's an incoming one, but for outgoing calls, the session is
>> created only after the first stream is requested (this is to do
>> with code organisation - peer can be added to the channel in
>> several ways (when requesting it, manual add, or just request
>> streams), but always RequestStreams must be called, so it's convenient
>> to have the call to session create on only one place).
>> However, all the tests expect SessionHandler signal to be emitted befor
>> the RequestStreams call.
>> While the code can be rearranged to mimick the previous behavior,
>> I think the new behaviour is correct (and a bit more logical, esp. if
>> you don't buy into 1:1 channel-session correspondence - session is
>> created when it's used). So I think the tests are in for some fixing
>> (unfortunately we're still a bit limited in expressing either-or
>> expectations in tests, to completely avoid this issue).
>> Regarding testing, Rob mentioned that we should improve test coverage
>> for jingle. This is especially true now with the new code that's
>> bound to have (hopefully as small as possible) regressions.
>> Currently we have basic tests for:
>> - outgoing call
>> - incoming call (+ rejected)
>> - several different ways of making the call
>> - basic content addition and removal
>> - a/v hold (only the telepathy side)
>> We could additionally have:
>> - test for each dialect we support, for incoming and outgoing calls
>> - test for gtalk4 -> gtalk3 fallback ala gtalk pc client
>> - more stream add/removal scenarios (early media, race conditions,
>> reject/remove/modify, etc)
>> - setting stream direction
>> - session/content timeouts and other possible errors
>> > gabble_jingle_session_parse() serves to validate and to handle
>> > Jingle <iq>s, so gets called twice for each, which is a bit weird.
>> > It's also a bit weird that it appears to be a method on a
>> > JingleSession but in fact that parameter can be NULL to signal that
>> > it's in validation mode.
>> My original idea was to parse and validate all XML and put it into
>> some intermediate form (a bunch of structs). We decided against it,
>> but wanted to avoid optimistic JingleSession object creation, just
>> to destroy it immediately if the session wasn't parsed. To avoid
>> duplicating dry-run/for-real code, I've decided to have a switch
>> for it, and sid being (!)NULL seemed appropriate. I also wanted
>> to keep all XML parsing out of JingleFactory (even sid extraction
>> and detecting if it's jingle stanza is dependent on the jingle
>> Now when I look at that code, since we don't emit new-session
>> signal until all went well, we can safely create a new session,
>> try to parse it, and silently destroy it if we're not happy
>> with the result,
>> > So, I think jingle_cb and this function should be reworked as
>> > follows:
>> > * Is it a "set"? does have a jingle stanza of some description? if
>> > not, ALLOW_MORE_HANDLERS
>> > * Are its attributes valid? if not, raise an error.
>> > * Does a GabbleJingleSession matching sid already exist?
>> > * If so, sanity-check that its peer matches the other party in
>> > this iq
>> > * If not, create it, and set all the attributes on it
>> > * Badger the state machine appropriately, or raise an error if the
>> > transition is illegal.
>> > The current code does the first two steps twice. :)
>> Checking whether it's jingle stanza, determining session
>> id, session action and dialect is all intertwined (there's
>> a couple of ifs that handle it). We can factor this into
>> a separate function (gabble_jingle_session_detect?), which
>> would return (sid, action, dialect) or error to jingle_cb.
>> The second phase would be gabble_jingle_session_parse
>> which would get these parameters fed, so it wouldn't have
>> to extract them again).
>> > JingleMediaRtp:
>> > If someone sends me a malformed stanza, they can crash Gabble?
>> (regarding g_assert_not_reached () in xmlns check)
>> No. JingleMediaRtp gets instantiated only for xmlns' it
>> was subscribed for, and is expected to deal with all of them
>> (see jingle_rtp_media_register). So if that branch is reached, that
>> means either it registered incorrectly, or there's a bug in
>> jingle factory or jingle media rtp code.
> Telepathy mailing list
> Telepathy at lists.freedesktop.org
Robert McQueen +44 7876 562 564
Director, Collabora Ltd. http://www.collabora.co.uk
More information about the Telepathy