'Machine ID' underspecified?

Wed Dec 9 13:44:45 UTC 2020

On Wed, 09 Dec 2020 at 12:36:42 +0000, Thomas Kluyver wrote:
> I've made a pull request:
> https://gitlab.freedesktop.org/dbus/dbus/-/merge_requests/198
> 
> I noticed that the systemd man page describing /etc/machine-id says:
> 
> > This ID uniquely identifies the host. It should be considered
> > "confidential", and must not be exposed in untrusted environments, in
> > particular on the network. If a stable unique identifier that is tied to
> > the machine is needed for some application, the machine ID or any part of
> > it must not be used directly. Instead the machine ID should be hashed...

That wording is a reasonable policy, but is much newer than the design
of the D-Bus machine ID, which was done something like 17 years ago.
Considering the machine ID to be "confidential" seems a bit too strong
to me, but having no consideration of anonymity at all goes too far
the other way, and if having this wording gets people thinking about
fingerprinting-resistance then it's probably worth it.

The way I like to think about this is that the D-Bus machine ID is like
gethostname(2), except that it avoids several reasons why the hostname
was not considered useful by the original designers of D-Bus:

* it is UUID-like, which means we can rely on different machines having
  different machine IDs (assuming machines set up by disk-cloning
  correctly reset the machine ID on a per-machine basis), even if their
  hostnames might be something generic like "localhost" or "debian";

* it is not human-readable or human-meaningful, which means sysadmins are
  not going to be tempted to change it for cosmetic reasons, causing a
  system that is fundamentally still "the same machine" to be accidentally
  considered to be different when its owner decides to stop naming their
  machines after Scottish villages and renames them after Tolkein characters
  instead, or some analogous change;

* it is obviously not a network-addressable identity, which means sysadmins
  and OS integrators will not be tempted to populate it from DHCP or
  similar dynamic protocols, causing a system that is fundamentally still
  "the same machine" to be considered to be different when it connects to
  different LANs

As a result, if you would be comfortable with sharing a high-entropy
hostname with a local network, sandboxed app or other semi-trusted
environment, it's plausible that you'd make the same decision about the
machine ID.

"localhost" and "debian" are not high-entropy hostnames that could easily
be used to identify a particular machine, but names like "eng-cbg-t520-5",
"Thomas-Macbook" or "horizon.collabora.com" are probably unique enough
to be useful to an attacker.

> I guess that normally dbus is a trusted environment, and all of the
> processes talking to it would typically be able to read /etc/machine-id
> anyway.

Yes. The well-known system bus is only meant to be accessible to processes
that could just read /etc/machine-id themselves, and the well-known session
bus cannot be made directly accessible to untrusted processes, because
access to the session bus gives you arbitrary code execution in the
session's TCB (for example, ability to execute code with the privileges of
things like gnome-session).

In frameworks that provide filtered or mediated access to the well-known
buses, like Flatpak's xdg-dbus-proxy or Snap's AppArmor policies, it's up
to the framework author to decide what their policy is for this. Flatpak
doesn't hide the machine ID, but that's consistent with the way Flatpak
doesn't hide the hostname or IP addresses, which are (at least sometimes)
similarly fine-grained.

> But is it a potential cause for concern that this is exposed
> on every object?

Arguably yes, and if I was designing D-Bus today, I wouldn't include
GetMachineId. I probably wouldn't include any cross-machine support at
all (in the interests of doing one thing well, in preference to multiple
things badly), but if there was any cross-machine identification support
at all, it would probably take the form of a method on the "bus driver",
like this straw-man design:

    org.freedesktop.DBus.GetIsSameMachine(s: bus_name) -> b: is_same
        Return TRUE if bus_name is on the same machine as the caller,
        FALSE if it is not, or raise an exception if it does not exist
        or the caller does not have permission to know more about it.

However, I didn't design D-Bus, and removing o.fd.DBus.Peer.GetMachineId
would be a backwards-incompatible change in D-Bus implementations that
provide it (in particular, libdbus). We try hard to avoid breaking
compatibility, because the most important feature of both D-Bus the
protocol and dbus the implementation is that they're compatible with past
versions of themselves.

> E.g. if you exposed a D-Bus proxy to the network
> which only accepted messages to certain bus/object names, should you
> also handle GetMachineId specially in the proxy to avoid exposing the
> 'confidential' ID?

Perhaps yes, but it depends on your threat model and your confidentiality
requirements. If your design assumes that the network is completely trusted
(like D-Bus-over-TCP, traditional remote X11, and traditional NFS) then it
would be pointless to filter GetMachineId.

    smcv