[Libreoffice-bugs] [Bug 125298] New: FILESAVE DOCX Bookmark names and field references shortened in case they are 40 characters long and contain non ASCII characters

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed May 15 09:02:27 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=125298

            Bug ID: 125298
           Summary: FILESAVE DOCX Bookmark names and field references
                    shortened in case they are 40 characters long and
                    contain non ASCII characters
           Product: LibreOffice
           Version: 6.3.0.0.alpha0+ Master
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Writer
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: libreoffice at nisz.hu

Description:
In the OOXML standard, there is a limitation for the bookmark names and for
field references (the value of <w:instrText> tags) to maximum 40 characters. 
There is an encode/decode mechanism in LibreOffice for non ASCII characters in
bookmark names and in field references, which mechanism creates more characters
from non-ascii characters. For example %C5%91 from ő. 
If the truncation happens before the decoding, non ASCII characters will be
counted as more than one characters, which means bookmark names or field
references can be truncated if they contain non ASCII characters.

Steps to Reproduce:
    1. Create some text
    2. Select some section of the text
    3. click on insert menu, select bookmark
    4. give it a name which contains non-ASCII characters and long enough (for
example árvíztűrő tükörfúrógép, or 1é2á3ű4ő5ú6ö7ü8ó9í)
    5. go to somewhere else in the document, for example to the end of
document, create a new paragraph 
    6. click on insert menu, select cross-reference
    7. select the value "Bookmark" in "Type" listbox, then select the value
"Reference" in the "Insert reference to..." listbox
    8. in the "Selection" listbox, double click on the previously named
bookmark
    9. save the file as docx and reload it
    10. rename the file to .zip instead of .docx, unzip it, and check out
document.xml in word folder
    11. look at these tags:
<w:bookmarkStart w:name="something" w:id="0"/>
<w:instrText> REF something \h </w:instrText>

Actual Results:
Some bookmarks which are not longer than 40 characters will be truncated if
they contain non ASCII characters.
For example: 1é2á3ű4ő5ú6ö7ü8ó9í as a bookmark name will be truncated to
1é2á3ű4ő5ú6%C3%
and árvíztűrő tükörfúrógép as a bookmark name will be truncated to
árvíztűrő_tük%C 

The cross references are still working despite the truncation, this is only a
cosmetic problem.

Expected Results:
In MS Word if a bookmark name contains non-ascii characters and its size is
below 41 characters it wont be truncated. We should emulate this behavior.


Reproducible: Always


User Profile Reset: No



Additional Info:
See also: 113483

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190515/3592188f/attachment-0001.html>


More information about the Libreoffice-bugs mailing list