Re-unifying udisks and storaged

Mon Jan 16 11:56:13 UTC 2017

On Mon, 05 Dec 2016 at 11:39:10 +0200, Marius Vollmer wrote:
> I feel that the only sane thing is to have UTF-8 filenames, and we
> should optimize for that.  If we have to deal with multiple encodings,
> the API should ideally protect the client from that by converting to and
> from UTF-8 on D-Bus.

If a filename is not UTF-8, it can have any or no encoding.
It might be Latin-1, it might be KOI8-R, or it might just be a pile of
bytes that makes no sense to any human reader.

You can't know what the intended encoding was, which means you can't
convert it to Unicode reliably.

You certainly can't go back from Unicode to the on-disk encoding
without knowing the on-disk encoding.

> If we just refuse to support non-Unicode filenames, could we get away
> with that?  I don't know...

Maybe. This is a domain-specific decision for udisks/storaged,
and is outside my area.

> Some horrible escapes to represent non-Unicode bytes?
> (Unicode doesn't have code points for "RAW BYTE FF", or does it?  I
> think it should... :-) But this actually makes it harder for a client to
> be correct, so, hmm...

This sounds a lot like <https://www.python.org/dev/peps/pep-0383/>,
which is fairly horrible but does sort of work... but, yes, it seems
highly error-prone.

> What can the client do with the filenames besides displaying them to the
> user, actually?  We could add a DisplayName property and most client can
> then hopefully ignore the Device, Symlinks, and PreferredDevice byte
> vectors.

That would make sense to me: have a DisplayName that is the result of
g_utf8_make_valid() or a similar algorithm (replacing unparseable bytes
with U+FFFD REPLACEMENT CHARACTER), document it similarly to
G_FILE_ATTRIBUTE_STANDARD_DISPLAY_NAME, and encourage callers to use it.

> (And interestingly, UDisks2 includes the temrinating \0 in its
> filename byte vectors...)

That's necessary if you want to be able to read the \0-terminated
byte vector out of the DBusMessage or GVariant with a const char *,
rather than copying. The D-Bus message format guarantees to put an extra
\0 (which is not part of the length count) after strings, but makes no
such guarantee for byte-arrays; GVariant is similar.

    S