[poppler] [PATCHES] Support to save encrypted files with PDFDoc::saveAs

Albert Astals Cid aacid at kde.org
Thu Sep 6 13:14:57 PDT 2012


El Dimecres, 5 de setembre de 2012, a les 22:35:02, Albert Astals Cid va 
escriure:
> El Dilluns, 20 d'agost de 2012, a les 22:32:16, Fabio D'Urso va escriure:
> > Hi,
> > the attached patches implement support to save encrypted files, both in
> > incremental and full rewrite mode. The new encryption features do not use
> > external libraries and share most code with existing decryption routines.
> > 
> > I've also modified pdf-fullrewrite so that it can deal with
> > password-protected files, do incremental updates instead of a full
> > rewrite,
> > and verify the generated file by comparing each object in the new document
> > with the original one. This allows for automated testing -- more details
> > below.
> > 
> > Note that the patches do *not* implement support for encrypted files in
> > utils tools such as pdfunite or pdfseparate.
> > 
> > --- Patches 0001-0002 -- Decrypt.cc: Added support for encryption
> > The first two patches affect Decrypt.cc/h, and create the EncryptStream
> > class, which is a filter stream that gives encrypted data in output.
> > 
> > In detail, in patch 0001 I split the existing DecryptStream class in two
> > classes (BaseCryptStream, that does key initialization, and DecryptStream,
> > that actually decrypts data). I've also rewritten getChar to internally
> > call lookChar, so that its logic needn't be duplicated.
> > 
> > Patch 0002 creates the EncryptStream class and adds support for AES and
> > AES-256 encryption (RC4 encrypts using the same algorithm as decryption,
> > therefore no new code was needed for RC4).
> > 
> > --- Patches 0003-0005 -- goo/grandom.cc: Pseudo-random number generation
> > Patches 0003-0005 are about pseudo-random number generation: I need random
> > numbers to initialize AES and AES-256 cipher block chaining. They needn't
> > be unpredictable, because they're stored in plain text, they just needed
> > to be different each time.
> > 
> > Calling srand and rand within a library seems a bad idea, because the
> > calling application may have already seeded srand with a known value to
> > get
> > a predictable sequence of numbers and, if we called srand or rand again,
> > we
> > would affect that sequence. Therefore, patch 0003 adds a goo/grandom.cc
> > file to generate random numbers in a safe way when possible. If rand_r is
> > available (POSIX), it uses it. Otherwise it relies on unsafe srand/rand
> > calls. Other safe methods will have to be developed for other platforms.
> > 
> > Patch 0004 replaces an existing occurrence of srand+rand calls in
> > SplashScreen::buildSCDMatrix with grandom calls. Note that previously rand
> > was seeded with a known value, therefore it always gave the same sequence
> > of numbers. Now the sequence is different each time, and I don't know if
> > this is an issue.
> > 
> > Patch 0005 uses grandom to initialize AES encryption.
> > 
> > --- Patch 0006 -- FlateStream::unfilteredReset fix
> > Patch 0006 fixes FlateStream::unfilteredReset so that is actually calls
> > unfilteredReset on the base stream (instead of reset), so that
> > FlateStream::getUnfilteredChar (inherited from FilterStream) can work
> > properly. This is the same behavior as all other FilterStream-derived
> > classes (see 9b72ee4e4c8658b2f7cd542d601a5c3be621d3fc).
> > 
> > --- Patch 0007 -- Be able to write objUint back
> > Patch 0007 makes sure that overflown integers can be written. This is
> > especially useful in full rewrite mode for some values of /P in the
> > encryption dictionary.
> > 
> > --- Patches 0008-0010 -- XRefEntry flags and preserving encryption info
> > Some objects, notably the /Encrypt dictionary, are stored in unecrypted
> > form. However, if encryption is enabled, we currently pass all objects
> > through decryption routines when fetching them. It hasn't done any harm
> > till now because we read the /Encrypt dictionary to initialize decryption,
> > when decryption is not enabled yet.
> > But now we need to be able to read all objects again at any moment, in
> > order to write them back in full rewrites. Therefore I've added a
> > Unencrypted> 
> > flag in XRefEntry to mark such unencrypted objects. In detail:
> >  - Since there was already a "bool updated" field, I've added a "int
> >  flags"
> >  
> >    field to store both flags (patch 0008).
> >  
> >  - Patch 0009 adds a XRef::setSpecialFlags() method that sets the
> > 
> > Unencrypted flags (it only needs to be called once in the lifetime of the
> > XRef instance). It recursively marks all entries referred from the
> > /Encrypt
> > dictionary as unencrypted (in practice I haven't seen any document with
> > indirect references in the /Encrypt dict, but this is what I interpreted
> > out of the specs).
> > 
> >    Patch 0009 also copies the /Encrypt field in the new document's trailer
> >    dictionary and removes code that refuses to save encrypted documents.
> >  
> >  - Patch 0010 makes sure that the first ID field is never changed if the
> >  
> >    document is encrypted, because it is used to calculate the decryption
> > 
> > key. Previously, we used to always generate a new ID in full rewrite mode.
> > 
> > Note about XRef::setSpecialFlags(): I've chosen to add a method to
> > explicitly set flags, instead of automatically setting them as soon as
> > xref
> > entries are parsed, because in next patches I add checks that are
> > expensive
> > (eg patch 0016 requires a full xref traversal, which would defeat
> > linearization if done automatically).
> > 
> > --- Patches 0011-0014 -- PDFDoc::write* methods
> > In encrypted PDFs, strings and streams are encrypted. The key varies
> > according to the object number.
> > 
> > Patch 0011 is a refactoring to separate the operations to write objects'
> > header (eg "11 5 obj") and footer ("endobj") from the operations to write
> > the object itself. This is only to make code introduced by 0012 clearer
> > (see commit message in 0011 for details).
> > 
> > Patch 0012 queries the XRef for encryption parameters and propagates them
> > recursively to all write methods. Objects flagged as Unencrypted are
> > passed
> > null encryption parameters, so that they are written back unencrypted.
> > 
> > Patch 0013 performs string encryption. Note: I don't see the reason why we
> > check if s->hasUnicodeMarker() or not right after the code I changed, as
> > the PDF specs do not make such a distinction. However, it should not
> > cause troubles with encrypted strings, because the two code paths only
> > differ in which characters are escaped, and such escapes are optional.
> > Patch 0014 performs stream encryption. Note that only strWeird streams are
> > encrypted here, other ones are raw-copied from the original file.
> > 
> > --- Patches 0015-0016 -- Special handling for XRef streams and ObjStm
> > objects Another category of objects in unencrypted form are XRef streams
> > (each with its own xref entry). Patch 0015 marks such entries as
> > Unencrypted, so that fetch can read them correctly. Note that XRef parsing
> > bypasses fetch, therefore it's unaffected by this patch.
> > Actually, storing XRef stream objects makes no sense in case of full
> > rewrite, because we always create a new XRef table. Therefore, copied XRef
> > stream objects from the original document are just a waste of space (and
> > also results in currupt objects, see commit message in 0015). Therefore
> > the
> > patch also sets a "DontRewrite" flag on those objects and skips them in
> > full-rewrite mode.
> > 
> > Another category of space-waster leaked objects are compressed object
> > streams, which are currently copied in fully-rewritten documents, even
> > though the objects they contain are individually written too. Patch 0016
> > sets DontRewrite on ObjStm objects too.
> > 
> > --- Patches 0017-0018 -- Test tools
> > The last two patches change utils/pdfinfo and test/pdf-fullrewrite. They
> > provide extra features I found useful while developing the patches, and
> > that I think can be useful to identify encrypted documents and do
> > automated tests on them.
> > 
> > utils/pdfinfo is modified to show the encryption algorithm (patch 0017).
> > 
> > test/pdf-fullrewrite is extended (patch 0018):
> >  - Support encrypted documents (via -upw/-opw command-line arguments)
> >  - Use incremental update mode instead of full-rewrite (-i switch)
> >  
> >    No new objects are written in this case, but a new trailer is appended,
> >    which is enough to test if the XRef chain is extendend properly (eg if
> >    /Prev works) and if the new trailer dictionary works (eg can be used
> >    spot
> > 
> > /Encrypt dictionary issues, or changed /ID in encrypted documents) - Added
> > support to optionally verify the generated document, by opening it back
> > and
> > comparing each object in the xref with the original document's one (-check
> > switch).
> > 
> > I'm sending the patches in a single big file suitable for git am.
> > I'm also attaching the shell scripts (relying on these modified tools)
> > that
> > I used in my tests. The "unpack" file contains the list of the document
> > I've run the tests on.
> > 
> > Of course, feedback and other test cases are more than welcome.
> 
> I've done some regression testing and some code looking and everything looks
> great. If noone has any other opinion i'll commit them to master tomorrow
> evening.

It's in.

Cheers,
  Albert

> 
> Cheers,
>   Albert
> 
> > Thanks in advance,
> > Fabio
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list