[Libreoffice-bugs] [Bug 62846] Incorrect glyph to Unicode mappings in PDFs (Graphite)

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Wed Jun 28 15:04:37 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=62846

--- Comment #48 from martin_hosken at sil.org ---
I lied. It's not producing good text, even if it is somewhat Arabic like. For a
start the text seems to be backwards.

Here's what is going on. Inside the PDF there is a 1:n mapping between glyphs
and characters. That's destined for failure just there because if you break off
your nuqtas, you are in for trouble. So, while libo does the best it can, the
results are going to be really bad regardless.

This has nothing to do with graphite vs harfbuzz, since by the time the pdf
writing is happening, everything has been shaped into the same structures. It's
just the nature of the problem that PDF cannot map n:1 glyphs:chars on output,
especially for the case [xy]:z and x:w. The only way to do this properly is to
output the unicode text along with the glyphed text as part of the PDF page
stream.

One way might be in vcl/source/gdi/pdf_impl.cxx to have another MARK() function
that takes a OUString&, nIndex and nLen and outputs that as the /ActualText as
part of the structure element dictionary in the /Span. This would only get
output if structured marking was turned on. I'm not sure if there would need to
be any other limiting factors like: the text contains CTL codepoints.

Suffice it to say that libo isn't up to handling CTL text for text export from
PDF. But let's not blame libo too much. This is really a bug in PDF since the
PDF specification only allows 1:n glyph:char mapping. All very latin centric ;)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20170628/974d5475/attachment-0001.html>


More information about the Libreoffice-bugs mailing list