[Poppler-bugs] [Bug 45163] pdftotext -bbox fails to write to stdout

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jan 26 15:41:27 PST 2012


https://bugs.freedesktop.org/show_bug.cgi?id=45163

--- Comment #2 from awendt at putergeek.com 2012-01-26 15:41:27 PST ---
(In reply to comment #1)
> Can you please write the exact command line you are using, what is the real
> output and what is the expected output?

Sure... This is the output without -bbox, everything works correctly:

$ pdftotext test.pdf -
Hello!
This is a sample PDF file.

Same command with -bbox added, note how the body element has no content:

$ pdftotext -bbox test.pdf -
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html
xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta name="Creator" content="Writer"/>
<meta name="Producer" content="LibreOffice 3.4"/>
<meta name="CreationDate" content=""/>
</head>
<body>
</body>
</html>

Where did the content go? Into a file literally named '-':

$ cat ./-
<doc>
  <page width="612.000000" height="792.000000">
    <word xMin="56.800000" yMin="57.208000" xMax="88.084000"
yMax="70.492000">Hello!</word>
    <word xMin="56.800000" yMin="71.008000" xMax="78.064000"
yMax="84.292000">This</word>
    <word xMin="81.184000" yMin="71.008000" xMax="89.152000"
yMax="84.292000">is</word>
    <word xMin="92.176000" yMin="71.008000" xMax="97.492000"
yMax="84.292000">a</word>
    <word xMin="100.480000" yMin="71.008000" xMax="134.392000"
yMax="84.292000">sample</word>
    <word xMin="137.464000" yMin="71.008000" xMax="159.424000"
yMax="84.292000">PDF</word>
    <word xMin="162.436000" yMin="71.008000" xMax="181.336000"
yMax="84.292000">file.</word>
  </page>
</doc>

The expected output is to have the <doc>...</doc> content inside the body
element that is sent to stdout, and no file named '-' generated.

I can get the expected output with 'pdftotext -bbox test.pdf /dev/stdout'
instead, but that is not very portable.

Basically, the code that writes the header and footer has a special case to
convert a filename of '-' to stdout, but the code that writes the bbox content
lacks the special case, so they interpret the output filename differently. (For
some reason the output file is closed and reopened by these different
components, instead of being left open.)

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list