[Libreoffice-commits] core.git: FastSaxSerializer: SAL_WARN() when writing invalid XML characters

Stephan Bergmann sbergman at redhat.com
Wed Mar 1 11:55:28 UTC 2017


On 03/01/2017 12:05 PM, Eike Rathke wrote:
> On Wednesday, 2017-03-01 10:34:04 +0100, Stephan Bergmann wrote:
>> (1)  If the input is assumed to be an arbitrary sequence of Unicode scalar
>> values (i.e., may contain noncharacters, even despite the caveat that those
>> should never be interchanged), the below invalidChar handling might want to
>> also watch out for U+FFFE and U+FFFF.
>
> Also if UTF-8 encoded? (as we write OString/chars there..)

Yes, the XML requirement is on the Unicode (or ISO/IEC 10646) 
characters, regardless how they're encoded in a given file.  Though it's 
probably a bit difficult to cram that check into FastSaxSerializer's 
design.  And, again, may even not be relevant if the input must not 
contain any noncharacters anyway.  (In configmgr, I didn't bother to 
ensure that at any higher abstraction level, and simply make sure 
arbitrary sequences of Unicode scalar values are properly encoded for 
XML's requirements.)


More information about the LibreOffice mailing list