[Libreoffice-bugs] [Bug 124191] Text copied from a PDF exported using Linux Libertine G Graphite font is missing characters. (comment 24)

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Fri Mar 22 18:04:44 UTC 2019


https://bugs.documentfoundation.org/show_bug.cgi?id=124191

--- Comment #31 from V Stuart Foote <vstuart.foote at utsa.edu> ---
(In reply to Khaled Hosny (inactive) from comment #21)
> > Here is result from a 6.2.1 build--note addition of the /ActualText
> > structure, which helps with fidelity of pasted text. But that the
> > LibreOffice generated /ToUnicode does look to have problems.

> That is fine, it means there is no unique one to one, or one to many mapping
> between these glyphs (not characters) and the input text, so no /ToUnicode
> and /ActualText tagging is used for them.

While things are much improved with HarfBuzz and moving the font handling into
CommonSalLayout. But I'm still not sure this is correct, at least not in
handling digraphs for the Graphite fonts. 

When LO exports to PDF the mapping of "The fire flying coffee left Quickly.",
with Graphite font(s), the /ToUnicode stuct is getting an additional glyph
added to the digraphs (both PUA and , and then is not mapping that glyph when
it probably should.

Use the below /ToUnicode chart with annotations, and read out the Tf[.*]TJ text
runs (from LO 6.2.1) in comment 16

<01> <005400680065>  --> "The", but maybe should be just "Th"?
x <02> -- "e" not mapped
<03> <0020> -- <sp>
<04> <006600690072>  -- "fir", but maybe should be just "fi"?
x <05> -- "r" not mapped
<06> <0066006C0079> -- "fly", but maybe should be just "fl"?
x <07> -- "y" not mapped
<08> <0069> -- i
<09> <006E> -- n
<0A> <0067> -- g
<0B> <0063> -- c
<0C> <006F> -- o
<0D> <006600660065> -- "ffe", but maybe should be just "ff"?
<0E> <006C> -- l
<0F> <006600740020> -- "ft<sp>", but maybe should be just "ft"?
<10> <005100750069> -- "Qui", but maybe should be just "Qu"?
<11> <006B> -- k

Seems consistently incorrect. A logic flaw in building the map(s)? Would that
be our pdfwriter_impl, or now the grapite2 hb shaper?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20190322/f831c41c/attachment.html>


More information about the Libreoffice-bugs mailing list