[Poppler-bugs] [Bug 29189] Fails to parse PDF with damaged internal structure

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jul 31 12:15:16 PDT 2010


https://bugs.freedesktop.org/show_bug.cgi?id=29189

--- Comment #7 from Ilya Gorenbein <igorenbein at finjan.com> 2010-07-31 12:15:16 PDT ---
Let see the example.

PDF file contains the static XRef:

xref
0 16
0000000000 65535 f
0000000009 00000 n   (offset to object 1 0)
0000000097 00000 n   (offset to object 2 0)
0000000165 00000 n   (offset to object 3 0)
0000000246 00000 n   (offset to object 4 0)
0000000316 00000 n   (...)
0000004084 00000 n
0000004290 00000 n
0000004441 00000 n
0000004567 00000 n
0000004664 00000 n
0000004751 00000 n
0000004821 00000 n
0000004922 00000 n
0000005046 00000 n
0000005155 00000 n
trailer

According to the code: poppler/PDFDoc.cc we have a function PDFDoc::setup
called at PDFDoc constructor. This function builds XRef. At XRef construction,
if everything goes “right” the table will be build according to the static map
(sample was mentioned earlier. constructXRef will not be called). At this point
we have fully constructed XRef table. 
Later in the PDFDoc::setup function we try to construct a Catalog. During the
Catalog construction we read objects according to the XRef. For example, we
have: 
    catDict.dictLookup("Pages", &pagesDict);.
    if (!pagesDict.isDict()) {
        error(-1, "Top-level pages object is wrong type (%s)",
          pagesDict.getTypeName());
        goto err2;
      }

If fetched from the file object is not a dictionary, the Catalog construction
will fail, which leads to error to open file.

Now let see the scenario we have in the real file:


%PDF-1.5
1 0 obj
<</#54ype/C#61#74#61l#6fg/P#61#67e#73 3 0 R/#4fp#65#6e#41c#74ion 5 0 R>>
endobj
3 0 obj
<</#54#79pe/P#61#67#65#73/Cou#6et 1/Kid#73 [4 0 R]>>
endobj
4 0 obj
<</#54#79#70e/#50age/#50#61re#6et 3 0 R/#41nn#6f#74#73 [7 0 R] >>
endobj





Object 2 0 is missing.

At XRef::fetch function we have the following condition:

if (obj1.isInt() && obj1.getInt() == num &&
            obj2.isInt() && obj2.getInt() == gen &&
            obj3.isCmd())

Which will fail in this scenario at obj1.getInt() == num check. Which leads to
error and further failure to open PDF.


When we are using dynamic construction of XRef (constructXRef function) we do
not have such problem. But, it is much more expensive function, so we will use
it rarely, in very special cases. 

Regards,
Ilya

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Poppler-bugs mailing list