[Libreoffice-ux-advise] [Bug 152143] Provide a mechanism to export PDF to text
bugzilla-daemon at bugs.documentfoundation.org
bugzilla-daemon at bugs.documentfoundation.org
Sun Nov 20 21:00:39 UTC 2022
https://bugs.documentfoundation.org/show_bug.cgi?id=152143
Hossein <hossein at libreoffice.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Writer |Draw
--- Comment #2 from Hossein <hossein at libreoffice.org> ---
I don't think this is a duplicate of tdf#32249. The title of that one is:
Bug 32249
"When importing PDF with text in it , it will be better to have a easy
and fluent option to edit the imported Text".
So, the above issue is basically talking about being able to edit the text. I
am here talking about being able to export the PDF as a text file. These are
obviously different, even if you discuss about the commonalities in the
implementation.
> So you can already select and consolidate entire pages of imported draw shape
> textboxes (by glyph index lookup in a ToUinicode CMAP) into a single draw
> shape textbox--a sentence or paragraph of text. And then select that text,
> copy it and paste it as needed. Then correct as lexically necessary.
I disagree. This is not what was intended in this feature request. I have
specifically requested means of exporting the whole PDF document as a text
file, both via UI and command line. The above consolidation feature might help
internally when you want to implement such a feature, but that is not what I
have asked for.
> Also, because PDF provides no lexical sense to the runs in a document (it is a
> published presentation format)--the discrete imported draw shape text boxes
> *must be selected in sequence* for a manual merge. That would remain the case
> working with draw shape textboxes on the Writer canvas and is a limitation of
> the published rendering encoded into PDF.
I disagree again. We have text boxes in LibreOffice, MS Office and elsewhere,
but we can export the contents to text files. I haven't requested for a smart
software that can understand the meaning of the document. The goal is to export
the contents to a text file.
> Doing more efficient and high fidelity text extraction from PDF into ODF
> paragraphs is the end goal of bug 32249.
>
> Export of lexically correct word, sentence or paragraph to other document
> formats then becomes routine export filtering that is already present.
Even by accepting this implementation path, it can be said that this feature
request is depending on tdf#32249, not a duplicate of it.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Libreoffice-ux-advise
mailing list