[Spice-devel] [spice-gtk v1] file-xfer: Fix bad filename encoding

Victor Toso victortoso at redhat.com
Wed Apr 12 13:55:04 UTC 2017


Hi,

On Wed, Apr 12, 2017 at 03:40:46PM +0200, Christophe Fergeau wrote:
> On Wed, Apr 12, 2017 at 07:19:54AM -0400, Frediano Ziglio wrote:
> >
> > >
> > > From: Victor Toso <me at victortoso.com>
> > >
> > > Manual for G_FILE_ATTRIBUTE_STANDARD_NAME states:
> > >  > The name is the on-disk filename which may not be in any known
> > >  > encoding, and can thus not be generally displayed as is.
> > >
> > > Considering a file named "ěščřžýáíé", if we use
> > > G_FILE_ATTRIBUTE_STANDARD_NAME get the file name, we will have the
> > > following 72 char long string:
> > > "\xc4\x9b\xc5\xa1\xc4\x8d\xc5\x99\xc5\xbe\xc3\xbd\xc3\xa1\xc3\xad\xc3\xa9"
> > >
> >
> > this string is only 18 characters long, why 72 ?
> >
> > > We should be use G_FILE_ATTRIBUTE_STANDARD_DISPLAY_NAME instead which
> > > will give us the correct 18 long utf-8 string: "ěščřžýáíé"
> > >
> >
> > I think this solves the encoding as we'll transmit with a given encoding
> > (utf8).
> > If the source filename is not correctly encoded this will give a
> > destination filename different from the source.
> > As the protocol does not include an encoding utf-8 is a good choice
>
> Do we really need an encoding for the filename in the protocol?

As part of the protocol, I don't see an issue with that to be honest.

> Filenames on disks are just byte arrays, even though these days they
> usually are UTF-8. I think I would not try to be too smart with respect
> to their encoding, and just send them as is (ie as a byte array without
> trying to stick an encoding on them).
>
> For linux<->linux dnd, this should be fine, for windows client -> linux
> guest, I guess this is fine too.
>
> There are probably going to have some corner cases on linux -> windows
> dnd, but things there seems a bit messy (different encoding on fat and
> ntfs FS)
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx
>
> However, in this scenario, we also need to deal with invalid characters
> in Windows filenames ( \ . ? * and so on), which is done on the agent
> side, so we could also handle encoding conversions on the agent-side
> too.
>
> Christophe

I think this is a different issue. The guest agent should deal with
guest related problems as maximum file size or filenames
special/prohibited chars as you mentioned. How these data is transferred
is in the protocol.

I'm not specialist on possible problems but although ascii is compatible
with utf8 it might become a problem to not have a standard well defined
in the future although Marc-André said it should be UTF-8.

    toso
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/spice-devel/attachments/20170412/098059a1/attachment-0001.sig>


More information about the Spice-devel mailing list