[Telepathy] Designing Telepathy/XMPP end-to-end security

Wed Jun 13 05:38:41 PDT 2012

On 12/06/12 14:42, Simon McVittie wrote:
> There are two broad use-cases: Text, and the rest. In this mail I'm only
> going to consider the non-Text case, because Text is more complicated.

OK, so how does this change for Text? Because I'm mainly interested in
securing non-Text at the moment, what I'm mainly trying to do here is to
check that I haven't accidentally made e2e security for Text, with OTR
and/or XTLS, impossible.

Odd things about Text
=====================

One thing that's different about Text (in XMPP, IRC, SIP and probably
many more) is that the Telepathy Channel is, in some sense, artificial.
Each message is an individual protocol transaction, and we group them
into a Channel for convenience.

Another is that there are (at least) two viable ways to secure Text in
XMPP, namely XTLS (as applied to an end-to-end XML stream, as suggested
in its RFC draft) and OTR. XTLS uses the extension mechanisms provided
by XMPP, whereas OTR operates in a different layer.

If each of XTLS and OTR has advantages over the other, we get into a
problematic situation where choosing which one to use is difficult,
because we have to know in advance which security properties the user
prefers.

Luckily, I don't think we have to do that: XTLS seems to have all the
desirable properties of OTR, except possibly "strong deniability" (and
as I said in an earlier email, I don't see how its version of strong
deniability is actually any better than what you get from XTLS).

If that wasn't the case, I think the solution would be to enhance one of
the protocols (in the case of XMPP my choice would be XTLS, because it
preserves extensibility) so that it has whatever properties are missing,
so that we can unambiguously declare it to be "better" if both are
supported. We should still support OTR for interop with other clients,
though.

Start of session
================

When starting a session, we have the same choices as for non-Text,
except that we also get to decide whether to use XTLS or OTR, or just
send plain-text messages. If the user has configured "no opportunistic
security, please" (i.e. they want server-side logging) then we use
plain-text messages and it degrades to the current situation.

If we can do standard XMPP capability discovery (roughly: the peer is on
our contact list and this isn't a limited, non-XMPP server like Facebook
or MSN) we can know whether they support XTLS. If we can't do capability
discovery, or they don't support XTLS, perhaps we can know (via OTR's
ingenious whitespace hack) that they support OTR.

I don't think interactively asking the user "should I use XTLS, OTR or
neither?" whenever they initiate a chat is appropriate, so let's not do
that.

Authentication
==============

If we've initiated a secure session (either XTLS or OTR) we can behave a
lot like the non-Text sessions I described in a previous mail - start
off secure against passive attackers, with the option to become secure
against active attackers. All good.

The next point of difference is that instead of certificates or SRP, OTR
offers raw keying material or the "Socialist Millionare's Protocol". One
way to deal with OTR keys would be to consider the OTR public key and
fingerprint formats (as described in "Public keys, signatures and
fingerprints" in OTR Protocol v2-3.1.0) to be a third "certificate"
format, alongside our existing "x509" and "pgp" formats, and do
certificate exchange in the same way we'd exchange self-signed X509
certificates.

SMP requires each participant to enter a secret, and then checks that
they entered the same secret, whereas SRP requires one participant to
enter a username and password, then has the other verify it - so they're
structurally different, and should be different interfaces (or possibly
different channel types).

Upgrading to protected
======================

The next point of difference is if we want to upgrade from an
unprotected channel to a protected one. I see two ways we could do this:

Upgrading
---------

Short version: "the unprotected Channel becomes protected".

More specifically, have the Securable interface on every payload Text
channel, whether it's protected or not. In an unprotected channel, every
security property will be FALSE (although I suppose if there's a
property for "at least as secure as in-band" - which would be FALSE in
current RTP calls, for instance - then that one will be TRUE, because
basic Text messages *are* in-band). When a protected Jingle or OTR
session is negotiated, attach it to the previously-unprotected Channel,
and flip some of the "security property" bits to TRUE. All subsequent
messages sent in that Channel go through the protected session.

To upgrade the unprotected Channel to protected, you would have to call
a method on that Channel (probably on the Securable interface) listing
security properties you wanted to insist on.

If an unprotected message is received while negotiating a protected
session, there's an undesirable ambiguity: was that message protected?
It comes down to timing, which is subtle. It gets particularly subtle if
you can still receive unprotected messages - perhaps from another
resource - after the protection has been negotiated. See "Downgrading to
unprotected", below, for correct handling.

This does have the advantage of avoiding having two parallel Text
channels with TargetHandle = Bob, in common situations. However, that
can't be avoided in corner cases (see "Downgrading to unprotected") so I
don't think it's worth putting effort into reducing it.

A naive (non-Securable-aware) UI would provide continuity here, by
continuing to use the same window/tab for all of the messages. I'm not
sure whether that's an advantage or not.

Replacing
---------

Short version: "have the protected session be a separate Channel".

When a protected Jingle or OTR session is negotiated, create a new
Channel, with some of its Securable "security property" bits TRUE
(probably everything except Verified, in practice). It could have a
reference to the old unprotected Channel (much like upgrading 1-1 Text
to a Conference), or that could be implied by being 1-1 with the same
target handle.

Optionally still have the Securable interface on unprotected Text
channels anyway, but the "security bits" will all be FALSE and can't
become TRUE. (I think we might want this so that Encrypted=FALSE can be
requestable, for people who want to insist that a particular channel is
server-side-loggable?)

A naive (non-Securable-aware) UI would either mix the unprotected and
protected channels into the same window or tab (if it ignores Channel
boundaries and distributes messages by target handle), or put the
protected session in a new window or tab (which has some advantages, to
be honest - it makes it more obvious where you're typing!). A
Securable-aware UI would have to respect Channel boundaries anyway,
since the Securable properties are per-Channel.

This has the disadvantage that you get more than one Text channel with
the same Contact TargetHandle - but in the situations below I don't
think that's avoidable, so, we'll have to live with it.

It has the advantage that it maps 1:1 to what will happen with Call,
StreamTube, FileTransfer channels, where each channel is either
unprotected or protected, and they never change state (except for
Verified which is special).

The reason I think it's OK for verification to be special is that it can
be somewhat retroactive. If Alice and Bob establish an encrypted channel
and send some inconsequential messages, then perform a verification
handshake before saying anything actually secret, then those
inconsequential messages were, in fact, just as secure as the secret
messages they send later - they just didn't know it yet!

Downgrading to unprotected
==========================

Given a protected Channel, it might seem a stupid idea to downgrade, but
in practice it'll happen, and we need to make sure it's obvious and
non-damaging.

In XTLS, nothing stops you from holding two entire conversations, one in
the protected XTLS channel and one in-band and unprotected. If we don't
forbid this entirely, they should be parallel Channels - given any
Channel, it should be instantly obvious whether messages are protected
or not. You certainly shouldn't have to look at each individual message,
I don't think.

(It might be useful for loggers if we *also* copy the current security
properties into each message's header, though.)

In OTR, the OTR conversation is in-band (it's basically just Base64
wedged into text messages). I assume that once OTR has got started,
messages that aren't valid OTR are considered to be an error, but on
balance it seems better to show them to the user (with a big fat warning
that the text is no longer authenticated) than not.

XMPP has the additional quirk that you can have multiple resources. If
Alice has an OTR or XTLS conversation with Bob from her laptop, which is
idle but being "kept alive", it seems entirely valid for Alice to
initiate a separate unprotected conversation with Bob, perhaps from a
mobile phone which can't do OTR or XTLS. (I wonder whether the Pidgin
reference implementation of OTR knows about resources?)

It would be Bad™ for a Channel's security to drop (race condition
between Bob reducing the Channel's security, and Alice pressing Send on
a sensitive message), so whenever a less-secure conversation starts, I
think that should be a separate Channel.

I think this is an argument in favour of the "replacing" model for
upgrades, to make it symmetrical - every time there's a change in
security level (except for verification which is special), there's a new
channel.

We could consider making it impossible to send unprotected messages
while a protected conversation with the same peer exists (by making
SendMessage suppress the <message> and report a temporary delivery
failure). I think this would have to be per-resource, for the
"non-OTR-capable phone" use-case above - Bob's IMs to Alice's phone
aren't going to be e2e secured, but perhaps he doesn't care.

If we *receive* an unprotected message - even at an unexpected time in
the protocol, like halfway through an OTR conversation - I don't think
there's any justification for not displaying it ("destroying information
is bad"), but we do need to make it absolutely clear that it's
unprotected and could be the result of tampering. I think it's enough to
separate it into a separate Channel lacking the "this is protected"
indicator, and write this into the spec:

    When a UI handling a Channel is asked to handle a
    less secure Channel with the same peer, it MUST separate
    it from the old Channel as much as it would for a
    channel whose peer was different (e.g. putting it in a separate
    window or tab). If it is asked to handle an equally secure
    or more secure channel with the same peer, it MAY separate
    that from the old Channel in the same way, or handle it
    in the same context (e.g. the same window or tab).

    When a UI establishes a secure Channel, it MUST Close any
    less secure Channels that it was handling in the same context
    (e.g. window or tab), so that if more messages from the peer
    arrive, they are placed in a separate context.

    | In a typical UI this results in obvious visual separation
    | between the protected messages and any further unprotected
    | messages (which could either be authentic, or from an
    | attacker). No messages will be lost, because if there
    | are unacknowledged messages when the unprotected Channel
    | emits Closed, the connection manager will "reopen" that
    | Channel, resulting in a new UI context to display them.

I think this has addressed all the differences between Text and other
channels, unless anyone can point out more?

    S