[poppler] line brakes and layout for multi-column texts ...

Albretch Mueller lbrtchx at gmail.com
Wed Feb 5 11:20:10 UTC 2020


 pdftotext has the option

-layout              : maintain original physical layout

 but pdftohtml doesn't

 $ pdftohtml --help
pdftohtml version 0.48.0
Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC

Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
  -f <int>              : first page to convert
  -l <int>              : last page to convert
  -q                    : don't print any messages or errors
  -h                    : print usage information
  -?                    : print usage information
  -help                 : print usage information
  --help                : print usage information
  -p                    : exchange .pdf links by .html
  -c                    : generate complex document
  -s                    : generate single document that includes all pages
  -i                    : ignore images
  -noframes             : generate no frames
  -stdout               : use standard output
  -zoom <fp>            : zoom the pdf document (default 1.5)
  -xml                  : output for XML post-processing
  -hidden               : output hidden text
  -nomerge              : do not merge paragraphs
  -enc <string>         : output text encoding name
  -fmt <string>         : image file format for Splash output (png or jpg)
  -v                    : print copyright and version info
  -opw <string>         : owner password (for encrypted files)
  -upw <string>         : user password (for encrypted files)
  -nodrm                : override document DRM settings
  -wbt <fp>             : word break threshold (default 10 percent)
  -fontfullname         : outputs font full name
$
~
  is it some sort of "hidden" parameter?, or, how do work around it?

  lbrtchx


More information about the poppler mailing list