[Libreoffice-bugs] [Bug 138100] UTF 8 Text File in Windows seems to problem with Umlauts (in DE äöüÄÖÜ) when loaded in Writer (at least til version 7.0)

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Tue Nov 10 05:57:45 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=138100

--- Comment #6 from Mike Kaganski <mikekaganski at hotmail.com> ---
(In reply to hastrondl from comment #5)
> What Option should be selected for the Standard UTF-8 (Not-UNICODE)

There is *never* a text encoded in one of UTF encodings, which is not Unicode.
UTF (*Unicode* Transformation Format) encoding family is created to encode UCS
(Universal Coded Character Set) character set standardized in  ISO 10646, and
that ISO standard is deliberately synchronized (identical) to The Unicode
Standard (created/maintained by Unicode Consortium). Any UTF-encoded file is
"some sequence of UCS codepoints, each codepoint encoded using this specific
UTF variant". So after decoding, you get sequence of UCS/Unicode codepoints,
never something else.

Please check RFC 3629 (UTF-8), and also RFC 2781 (UTF-16), RFC 2152 (UTF-7);
ISO 10646; The Unicode Standard (current version [1] of which explicitly says
"This version of the Unicode Standard is also synchronized with ISO/IEC
10646:2020, sixth edition", just like previous versions stated synchronization
with then-respective ISO standard versions).

So the idea of a "Standard UTF-8 (Not-UNICODE)" is absurd.

[1] http://www.unicode.org/versions/Unicode13.0.0/

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20201110/1a8f07a3/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list