[poppler] Reading code for FUN

thomas.huxhorn at web.de thomas.huxhorn at web.de
Fri Jun 18 09:40:44 UTC 2021


Progress! :)
I now know what a linearized PDF is and I found the code.
class Dokument -> DokumentData -> PDFDoc -> isLinearized() ->
Linearization -> Parser -> Lexer -> Object

Here is my question:
In function isLinearized() PDFDoc.cc line 731 from poppler 21.06.0
getLinearization()->getLength() == str->getLength()
It test if length given bei \L in the PDF document equals the PDF file
size. It is 1830148 in both cases for my test file. But why is this a
reason for a linearized PDF?

The PDF could be linearized and corrupt so the sized are not equal.

Yes, I read a few more lines and if variable "tryingToReconstruct" is
true, only the \L size is tested. Im not familiar with this concept.

Thomas

On 6/13/21 11:51 AM, thomas.huxhorn at web.de wrote:
> Hello,
>
> I always wonder why it takes so long to display big pictures in PDF
> files on linux. So I recompile poppler with release+debug symbols and
> use valgrind profile to get an idea what is happening there. As I can
> see, 50% of the CPU ticks are used to copy data from A to B. But without
> knowing the code, its hard to say if this is good or not.
> So I start reading the code. Puh it's hard to understand, so I start
> reading the PDF reference too.
>
> As far I can see, there a linearized PDFs and and a table called xref.
> Not much for one week of reading ;)
>
> Perhaps I should start from scratch and write my own PDF reader to
> understand things better.
>
> If you are interested in, I can inform you about my progress. I'll do
> this in my free time, so no hurry.
>
> Thomas H.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list