[poppler] Recent changes in whitespace rendering with physical_layout

Albert Astals Cid aacid at kde.org
Tue May 4 23:09:17 UTC 2021


Seems we're going to follow that up in https://gitlab.freedesktop.org/poppler/poppler/-/issues/1076

El dilluns, 3 de maig de 2021, a les 15:22:19 (CEST), Jeroen Ooms va escriure:
> I maintain R bindings called pdftools, mostly used for extracting text
> from scientific documents. The bindings wrap the C++ API, in
> particular we convert pdf to text using poppler::page::text() with
> physical_layout.
> 
> Recently users have started to report changes in behaviour with newer
> versions of poppler, in particular wrt whitespace. For example, all
> pages are now terminated  end with an '\f' symbol which was not the
> case before. On Windows, linebreaks are now converted as '\r\n'
> instead of just '\n' as before (we use mingw-w64 compilers). And also,
> some documents that would contain a single linebreak in e.g. poppler
> 0.73, now have 4 or 5 linebreaks on the same place with the latest
> poppler.
> 
> I had a look at the changelog but I couldn't find any notes of this.
> Are these expected changes? The new behavior is causing some existing
> pipelines to break, where people were using e.g. line offsets to
> extract fragments of the text.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 






More information about the poppler mailing list