[Libreoffice-bugs] [Bug 77278] Rewrite old Pocket Word (PWI) file format import filter.

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Fri Apr 17 08:52:13 UTC 2020


https://bugs.documentfoundation.org/show_bug.cgi?id=77278

--- Comment #13 from osnola <alonso at loria.fr> ---
The old filter is quite basic: it reads the font names, then uses some
heuristics to retrieve paragraphs of text, retrieving:
- the main properties of the characters, with the notable exception of the
superscript and the subscript…
- the properties of the following paragraph: first indent, left/right margin,
left/center/.. alignment, a flag to know if it is a bulleted list,
I have rewritten a « more robust » version of this code in libwps, but clearly,
there are many things that are not recovered (as I can not guess
what there means).

If you want to try it, I have updated the libwps version compiled with
emscripten: http://libwps.sourceforge.net/convertWPS.html .

To improve this filter, it would be useful to have some Pocket Word files (and
their pdfs equivalent) that :
- [character properties] use exponents and subscripts,
- [paragraph properties] have paragraphs with single/double/double line
spacing/..., with a certain spacing before/after the paragraph, lines with
fixed height, different types of listings
- [general] contains header(s), footer(s), footnote(s), endnotes, comment(s),
image(s), table(s)…
and simple documents with different page sizes, different margins, some
metadata...
- …

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20200417/f68a4dd0/attachment-0001.htm>


More information about the Libreoffice-bugs mailing list