[Libreoffice-bugs] [Bug 125596] New: DOCX: Writer misidentify text language (and appropriate font) in MS Word file (MSO2019)

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu May 30 17:08:45 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=125596

            Bug ID: 125596
           Summary: DOCX: Writer misidentify text language (and
                    appropriate font) in MS Word file (MSO2019)
           Product: LibreOffice
           Version: 6.0.7.3 release
          Hardware: x86-64 (AMD64)
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: peat at peat-network.xyz

Created attachment 151787
  --> https://bugs.documentfoundation.org/attachment.cgi?id=151787&action=edit
The DOCX file which has the problem

Step to reproduce:
1. Download the font "TH Sarabun New" from [1]. The font is licensed under GPL
2.0 + font exception.
2. Open the attached DOCX document. The text is configured to use "TH Sarabun
New" as the complex (Thai) font, and "Liberation Sans" as the western font.
Both of them are 16 pt.

Expectation: The Thai text (the word "ไทย") and most of the dots (".") are
displayed using "TH Sarabun New", while the English text (the word "English")
and the dots between the pipes ("|", including the pipes themselves) are
displayed using "Liberation Sans". The whole text is fit within one line. MS
Word 2019 shows this expected behavior. (See the screenshots.)

Actual result: The Thai text is displayed using "TH Sarabun New", while the
English text, all dots, and the pipes are displayed using "Liberation Sans".
The whole text is not fit within one line.

The problem is reproducible on:
- LO 6.0.7-0ubuntu0.18.04.6 from Ubuntu 18.04.
- LO 6.2.4.2 on Ubuntu 18.04, Snap and AppImage.
- LO 6.2.4.2 on Windows 10 version 1903 (build 18326.86)

The reason this is important is that most of the Thai fonts use the different
font metrics then western fonts. For historical reason [2], Thai fonts consider
that point-size means "line-height". As Thai symbols contain the symbol above
and below the character, Thai fonts are usually 30% smaller than western fonts
at the same point-size. [3]

Adding to this problem, MS Word considers the language of the text using the
keyboard layout when it's typed, not actual text. For example, typing a dot
(".") while using a Thai keyboard layout will make that dot Thai while typing a
dot while using an English keyboard layout will make that dot English. MS Word
seems to record this information in the file, which LO seems to be unable to
read. So, when LO opens the file, LO displays the text using the wrong font
with different font metric, causing the document's layout to changes.

[1] http://mdresearch.kku.ac.th/files/font/THSarabunNew.zip
[2] http://thep.blogspot.com/2016/02/thai-font-metrics.html (In Thai)
[3] However, some Thai fonts, mostly fonts from Thai Linux Working Group
(TLWG), now uses the new metric which considers point-size to be character
size. This makes those fonts have the same size as western fonts. See [2].

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190530/37e34d29/attachment-0001.html>


More information about the Libreoffice-bugs mailing list