<html>
    <head>
      <base href="https://bugs.freedesktop.org/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - endstream detection"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=62985">62985</a>
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>poppler-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>endstream detection
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>Thomas.Freitag@alfa.de
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>general
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>poppler
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=77269" name="attach_77269" title="endstream detection and scanSpecialFlags correction">attachment 77269</a> <a href="attachment.cgi?id=77269&action=edit" title="endstream detection and scanSpecialFlags correction">[details]</a></span>
endstream detection and scanSpecialFlags correction

1. During porting poppler to Java I made a mistake in the "<objnum> 0
obj<length>" pattern detection so that it fails. Therefore it ran into the
endstream search, and at least with bug-poppler16579.pdf this doesn't work
correctly: the shift(-1) with the used token mechanism in Lexer isn't correct
for a binary data stream. If there is i.e. a "(" without corresponding ")" in
the binary data, which of course can happen and happens in that pdf, shift(-1)
skips the searched endstream and can therefore in worst case reach the
end-of-file. Therefore I implemented a shift("endstream") in Java, which I now
port back to C++, or in other words "There and Back Again" :-)

You can test it with bug-poppler16579.pdf if You just change temporary 

      if (longNumber <= INT_MAX && longNumber >= INT_MIN && *end_ptr == '\0') {

in XRef.cc to

      if (gFalse && longNumber <= INT_MAX && longNumber >= INT_MIN && *end_ptr
== '\0') {

2. The small change in XRef.cc was another point I detected during the Java
port: if You save a PDF with defect xref offsets, the 

readXRefUntil(-1 /* read all xref sections */, &xrefStreamObjNums)

in XRef::scanSpecialFlags() will destroy the already reconstructed entries
table, but this means that any modification which the user did in the meantime
get lost. This can be tested i.e. with bug168518.pdf.

The attached patch solves this two issues.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>