[Libreoffice-ux-advise] [Bug 151552] PDF import into writer messes up line justification

Sun Oct 16 19:23:37 UTC 2022

https://bugs.documentfoundation.org/show_bug.cgi?id=151552

V Stuart Foote <vstuart.foote at utsa.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|filters and storage         |Writer
                 CC|                            |khaled at aliftype.com,
                   |                            |quikee at gmail.com,
                   |                            |thb at libreoffice.org,
                   |                            |xiscofauli at libreoffice.org

--- Comment #7 from V Stuart Foote <vstuart.foote at utsa.edu> ---
(In reply to Eyal Rozenberg from comment #6)
No, please understand how our poppler based PDF import filtering functions.

PDF is not an editable format. We do not Edit PDFs. A PDF viewer processor will
open and parse PDF stream content onto fully described (in postscript) pages.
And then manage display of those complete pages.

Even for a document being "round-tripped" LibreOffice's import filter(s), using
external poppler and poppler-utils libraries, extracts the content streams from
the published presentation, and converts each stream into a discreet draw Shape
object. 

The text runs in the PDF are just one of the content streams. Those discreet
text run content streams have no lexical details and are strictly glyph based
snippets of text with font and character metrics that are then used to create
the draw Shape textboxes. The content stream includes a starting position on
the published page, and that is used to coarsely position the draw textbox to
LO canvas.  That is why the text runs are not rendered to LO canvas as
"justified" and can exceed the LO canvas margins.

The mishandling of the RTL text was also manifestation of the fact that the
content stream records text in the order they are recorded to the postscript
page. There are similar issues for complex text recorded to PDF with
/ActualText flag support.

PDF Viewers don't need to do more with the content streams--they simply parse
them and lay them out as described in the postscript pages.

And LibreOffice actually includes a PDF viewer processor--that is the pdfium
based ipdf filter used to insert PDF page as image.

Improving fidelity of filter imported draw Shapes to content on the source PDF
published page is out of scope for project.  

Put another way it is not justified to expend dev, QA and design resources
working on the PDF import filters when we offer exceptional fidelity for PDF
content using the pdfium based insert filters. Where any "manipulation" of the
source PDF (e.g. page extraction, clipping, etc.) to prepare it for insertion
is best done external to LibreOffice.

And that is why I make the suggestion that perhaps it would be best just to
drop  the functional poppler based PDF import filter from core LO deliverables.
And it could then be packaged more effectively as an extension (where it
started in the Oracle OOo era).

And again, LibreOffice is *not* a PDF editor.

-- 
You are receiving this mail because:
You are on the CC list for the bug.