A question about com.sun.star.frame.XStorable's URL

Takeshi Abe tabe at fixedpoint.jp
Mon Jan 23 10:58:35 UTC 2017


Hi Stephan,

Thanks a lot for your reply.

On Mon, 23 Jan 2017 10:26:09 +0100, Stephan Bergmann <sbergman at redhat.com> wrote:
> On 01/20/2017 03:25 AM, Takeshi Abe wrote:
>> Preparing a patch for tdf#105382 [1], I come across a question about
>> character encoding for the path part of a URL representing a
>> com.sun.star.frame.XStorable's location.
>> I wonder if the original (before percent-encoded) path of such a URL can
>> be in an encoding other than UTF-8 or even in a different charset due
>> to e.g. a code page of some legacy filesystems.
>> Is it possible?
>> And, if so, is there any reasonable way to tell the encoding?
> 
> A conforming URL itself, by definition, is written with a subset of ASCII-only
> characters.
> 
> For file URLs, there never was a definition how to interpret the octets encoded
> in the URL's path component, so OOo/LO came up with the convention of always
> interpreting those as UTF-8.  (So any code that converts between file URLs and
> native pathnames needs to do that mapping between UTF-8 and the relevant native
> pathname encoding, which LO assumes to be as reported by
> osl_getThreadTextEncoding.)
Got it. What should be done for tdf#105382 becomes clear now.

IIUC the basic strategy to encode a file URL for UNO is the same as a current
standard [1] describing in section "2.5. Identifying Data":
> (...) A
> system that internally provides identifiers in the form of a
> different character encoding, such as EBCDIC, will generally perform
> character translation of textual identifiers to UTF-8 [STD63] (or
> some other superset of the US-ASCII character encoding) at an
> internal interface, thereby providing more meaningful identifiers
> than those resulting from simply percent-encoding the original
> octets.

[1] https://tools.ietf.org/html/rfc3986

Cheers,
-- Takeshi Abe


More information about the LibreOffice mailing list