<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - libpoppler cannot recreate pdftotext output, because physical_layout is not handled correctly"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=103798">103798</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>libpoppler cannot recreate pdftotext output, because physical_layout is not handled correctly
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>poppler
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86-64 (AMD64)
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux (All)
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>critical
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>cpp frontend
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>poppler-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dummydummy@gmx.fr
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Dear maintainer, this bug concerns poppler 0.48.0 up to at least 0.60.1

in file .../gcc/poppler-page.cpp

the function     
         ustring page::text(const rectf &r, text_layout_enum layout_mode) const

when called with  physical_layout  as  layout_mode  incorrectly creates a 
TextOutputDev with second parameter (supposed to be true for physical_layout)
always set to gFalse, because the corresponding code in lines 272 and 273
(poppler 0.60.1) are 

    const GBool use_raw_order = (layout_mode == raw_order_layout);
    TextOutputDev td(0, gFalse, 0, use_raw_order, gFalse);


By contrast the pdftotext.cc creates TextOutputDev with second parameter set to
gTrue when called with the -layout command line option.

THE EFFECT, is that the text produced inside a program using libpoppler differs
from the more faithful text (which has, for example, blank lines where
required) produced by invoking pdftotext with the -layout option.

Would the following be a solution?
    const GBool use_raw_order = (layout_mode == raw_order_layout);
    const GBool use_physical_layout = !use_raw_order;
    TextOutputDev td(0, use_physical_layout, 0, use_raw_order, gFalse);

I would be grateful, if this could be fixed.
The alternative I do not relish, would appear to be to compile virtually all of
the poppler source code into my program, just to give it access to
TextOutputDev and thus be able to call it with gTrue as second parameter. This
does not appear to be what libpoppler is supposed to be for.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>