[Libreoffice-bugs] [Bug 48446] New: RTF Importer does not honor ansicpgN and cpgN control words -> fails to import some non-Englist documents properly

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Apr 9 03:15:18 CEST 2012


https://bugs.freedesktop.org/show_bug.cgi?id=48446

             Bug #: 48446
           Summary: RTF Importer does not honor ansicpgN and cpgN control
                    words -> fails to import some non-Englist documents
                    properly
    Classification: Unclassified
           Product: LibreOffice
           Version: LibO 3.5.2 Release
          Platform: Other
        OS/Version: Windows (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
        AssignedTo: libreoffice-bugs at lists.freedesktop.org
        ReportedBy: mikekaganski at hotmail.com


Created attachment 59657
  --> https://bugs.freedesktop.org/attachment.cgi?id=59657
Test file showing this behaviour

When an RTF document contains a /ansigpgN control word in the header just after
/ansi control word, a reader should use this code page to perform
ansi-to-Unicode conversion wherever another codepage isn't specified for a text
run and Unicode RTF isn't used[1]. When a font definition contains /fcharsetN
control word, it overrides the top-level setting, and when there is a /cpgN, it
overrides both top-level setting and /fcharsetN [2].

Now, when opening an RTF which doesn't contain any codepage/charset data, LO
defaults to Latin-1 (see Bug 48023). If such document contains /ansicpgN, of
its fonts have /cpgN, LO ignores this information, and still uses Latin-1. Only
/fcharsetN is taken into account.

The attachment is the test document from Bug 48023, where the missing language
information is manually added. There is /ansicpg1251 in the header now, as well
as /fcharset204 in one font, and /cpg1251 in another. It may be seen, that only
the text using the first font is displayed properly.

As to documents that don't contain language information at all (and there is a
great number of such documents generated by various non-MS software out there),
I believe that LO should use user language (and provide a means of specifying
another on opening, like a checkbox in Open dialog saying "Specify missing
charset" doing something similar to Text Encoded filter).

--
1. Word 2007: Rich Text Format (RTF) Specification, version 1.9.1
(http://www.microsoft.com/download/en/details.aspx?id=10725), page 12:
Character Set
2. Ibid., pages 17-20.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the Libreoffice-bugs mailing list