[Libreoffice-bugs] [Bug 144050] FILEOPEN on an RTF document replaces multiple spaces with other characters, losing layout justification

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sat Sep 11 14:12:03 UTC 2021


https://bugs.documentfoundation.org/show_bug.cgi?id=144050

--- Comment #7 from Bernard Moreton <bernard.moreton at gmail.com> ---
The uploaded example file is a pared-down extract from a much longer PDF report
file, with most of the actual text replaced character-for-character, for
obvious discretionary reasons.
The PDF was reduced to text using
pdftotext -layout $src        # $src being the PDF file

A standard RTF header block is then written, with the mandatory {\rtf1\ansi
followed by a brief FONTTBL, COLORTBL (probably redundant), and a single style
in the STYLESHEET.
I now follow that with the 
{\*\generator LibreOffice/7.1.5.2$Linux_X86_64 LibreOffice_project/10$Build-2}
to stop the unwanted behaviour of appending the strange characters in
multi-soace strings.
Then the lines defining the papersize, margins, and orientation for the
document and the section (the latter again probably redundant),
and finally the "\pard\plain \s7" to start the body of the text.

The text is then copied from the text file, adding a "\line" at each line-end.

And finally the RTF ending is added, "}"

I'd upload the BASH executable, but the source RTF already uploaded shows the
process more clearly than the BASH script could do!

I've been using this sort of method for many years for reporting from 4GL,
whether simply to LO (and OOo before that), or using LO to create a PDF from
the command line - though in 4GL reporting most of the formatting is done by
defining tabs.

When processing pre-formatted text, however, especially from the output of
PDFTOTEXT, multiple spaces are unavoidable;  but they should *never* be added
to with strange characters as the LO FILEOPEN for RTF obviously does.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20210911/bfb67000/attachment.htm>


More information about the Libreoffice-bugs mailing list