[Libreoffice-bugs] [Bug 48446] New: RTF Importer does not honor ansicpgN and cpgN control words -> fails to import some non-Englist documents properly
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Apr 9 03:15:18 CEST 2012
https://bugs.freedesktop.org/show_bug.cgi?id=48446
Bug #: 48446
Summary: RTF Importer does not honor ansicpgN and cpgN control
words -> fails to import some non-Englist documents
properly
Classification: Unclassified
Product: LibreOffice
Version: LibO 3.5.2 Release
Platform: Other
OS/Version: Windows (All)
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Writer
AssignedTo: libreoffice-bugs at lists.freedesktop.org
ReportedBy: mikekaganski at hotmail.com
Created attachment 59657
--> https://bugs.freedesktop.org/attachment.cgi?id=59657
Test file showing this behaviour
When an RTF document contains a /ansigpgN control word in the header just after
/ansi control word, a reader should use this code page to perform
ansi-to-Unicode conversion wherever another codepage isn't specified for a text
run and Unicode RTF isn't used[1]. When a font definition contains /fcharsetN
control word, it overrides the top-level setting, and when there is a /cpgN, it
overrides both top-level setting and /fcharsetN [2].
Now, when opening an RTF which doesn't contain any codepage/charset data, LO
defaults to Latin-1 (see Bug 48023). If such document contains /ansicpgN, of
its fonts have /cpgN, LO ignores this information, and still uses Latin-1. Only
/fcharsetN is taken into account.
The attachment is the test document from Bug 48023, where the missing language
information is manually added. There is /ansicpg1251 in the header now, as well
as /fcharset204 in one font, and /cpg1251 in another. It may be seen, that only
the text using the first font is displayed properly.
As to documents that don't contain language information at all (and there is a
great number of such documents generated by various non-MS software out there),
I believe that LO should use user language (and provide a means of specifying
another on opening, like a checkbox in Open dialog saying "Specify missing
charset" doing something similar to Text Encoded filter).
--
1. Word 2007: Rich Text Format (RTF) Specification, version 1.9.1
(http://www.microsoft.com/download/en/details.aspx?id=10725), page 12:
Character Set
2. Ibid., pages 17-20.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Libreoffice-bugs
mailing list