[Libreoffice-bugs] [Bug 124191] Text copied from a PDF exported using Linux Libertine G is missing characters.

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Mar 20 19:41:58 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=124191

--- Comment #10 from V Stuart Foote <vstuart.foote at utsa.edu> ---
Poking at this I extracted page 10 of the Last-days-events PDF with Acrobat Pro
DC v.2019.008.20080, and uncompressed its streams with qpdf v.8.4.0

While the page is extracted using Acrobat--I think it is correct to say its
original structure was LibreOffice generated with rdf--xmp CreatorTool Writer,
and Producer LibreOffice 6.0. 

The Linux Libertine G regular, along with other fonts, get recorded as a
/BaseFont struct [1], while its /ToUnicode map is also created [2].

What is odd is that the character <01> is mapped to unicode glyphs
"005400680065" or "The"; there is no <02> in the map, and the U+0065 "e" is
never defined as a single glyph.  Character <26> is "ffe" --suffering--, <36>
is "tte", and <52> is "Que"

Attaching the uncompressed Stream, where the BT & ET bracketed strings with /F6
font are the Linux Libertine G. The /F2 stanza is the opening "Cover Picture"
in Linux Biolinum G. 

While there is a character <02> used in the passages--it does not appear in the
/ToUnicode lookup talbe.

Since LibreOffice should have written out the /ToUnicode struct for subsetted
fonts, believe issue could be there. Some of the original PDF stuff in
pdfwriter_impl?


=-ref-=
[1] 
<< /BaseFont /GAAAAA+LinuxLibertineG /FirstChar 0 /FontDescriptor 58 0 R
/LastChar 89 /Subtype /TrueType /ToUnicode 59 0 R /Type /Font /Widths [ 500
1047 446 250 427 503 496 371 518 270 315 530 746 456 389 511 541 537 337 530
500 263 789 309 464 464 464 492 505 514 219 219 559 423 476 489 484 587 581 235
645 548 586 596 694 296 698 267 694 838 729 953 271 828 613 525 595 701 574 502
700 464 464 464 375 375 464 464 660 464 704 321 539 741 464 434 235 297 297 748
651 287 1250 636 814 547 327 603 701 587 ] >>

[2]
59 0 obj
<< /Length 1454 >>
stream
/CIDInit/ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo<<
/Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName/Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
88 beginbfchar
<01> <005400680065>
<03> <0020>
<04> <0063>
<05> <006F>
<06> <0076>
<07> <0072>
<08> <0070>
<09> <0069>
<0A> <0074>
<0B> <0075>
<0C> <0077>
<0D> <0061>
<0E> <0073>
<0F> <006B>
<10> <006E>
<11> <0068>
<12> <002D>
<13> <0050>
<14> <0067>
<15> <006C>
<16> <006D>
<17> <0066>
<18> <0031>
<19> <0039>
<1A> <0030>
<1B> <0062>
<1C> <0064>
<1D> <0079>
<1E> <002E>
<1F> <002C>
<20> <006600690072>
<21> <007A>
<22> <0046>
<23> <0078>
<24> <0053>
<25> <0042>
<26> <006600660065>
<27> <003A>
<28> <0043>
<29> <0045>
<2A> <0052>
<2B> <0054>
<2C> <0041>
<2D> <0049>
<2E> <004E>
<2F> <2019>
<30> <0047>
<31> <004D>
<32> <0048>
<33> <0057>
<34> <006A>
<35> <0066006600690063>
<36> <007400740065>
<37> <004C>
<38> <006600740020>
<39> <004F>
<3A> <0059>
<3B> <0071>
<3C> <0044>
<3D> <0032>
<3E> <0037>
<3F> <0033>
<40> <201C>
<41> <201D>
<42> <0036>
<43> <0038>
<44> <0055>
<45> <0034>
<46> <0026>
<47> <004A>
<48> <0066006C0069>
<49> <2014>
<4A> <0035>
<4B> <003F>
<4C> <003B>
<4D> <0028>
<4E> <0029>
<4F> <2026>
<50> <0056>
<51> <0021>
<52> <005100750065>
<53> <004B>
<54> <00660066006C0069>
<55> <2013>
<56> <0066>
<57> <005A>
<58> <0051>
<59> <2020>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream
endobj

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190320/1cec7ffd/attachment-0001.html>


More information about the Libreoffice-bugs mailing list