text/xml vs application/xml

Christian Rose menthos at menthos.com
Thu Dec 1 22:53:17 EET 2005


On 12/1/05, Kevin Krammer <kevin.krammer at gmx.at> wrote:
> Isn't an XML file considered to be in ASCII unless a different enconding is
> specified by the processing instruction?

Not really. Unless other information is given, AFAIK an XML file is to
be assumed to be in UTF-8.
Quote from http://www.w3.org/TR/REC-xml/#charencoding :

"In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is a fatal error for an entity
including an encoding declaration to be presented to the XML processor
in an encoding other than that named in the declaration, or for an
entity which begins with neither a Byte Order Mark nor an encoding
declaration to use an encoding other than UTF-8. Note that since ASCII
is a subset of UTF-8, ordinary ASCII entities do not strictly need an
encoding declaration."

As a consequence, a file containing only ASCII characters but no
encoding information would be valid XML. But *assuming* that any file
without encoding information will be valid ASCII is plain wrong. Valid
ASCII is always valid UTF-8, but not necessarily the other way around.


Christian



More information about the xdg mailing list