[poppler] [PATCHES] Support to save encrypted files with PDFDoc::saveAs

Albert Astals Cid aacid at kde.org
Wed Sep 5 13:35:02 PDT 2012


El Dilluns, 20 d'agost de 2012, a les 22:32:16, Fabio D'Urso va escriure:
> Hi,
> the attached patches implement support to save encrypted files, both in
> incremental and full rewrite mode. The new encryption features do not use
> external libraries and share most code with existing decryption routines.
> 
> I've also modified pdf-fullrewrite so that it can deal with
> password-protected files, do incremental updates instead of a full rewrite,
> and verify the generated file by comparing each object in the new document
> with the original one. This allows for automated testing -- more details
> below.
> 
> Note that the patches do *not* implement support for encrypted files in
> utils tools such as pdfunite or pdfseparate.
> 
> --- Patches 0001-0002 -- Decrypt.cc: Added support for encryption
> The first two patches affect Decrypt.cc/h, and create the EncryptStream
> class, which is a filter stream that gives encrypted data in output.
> 
> In detail, in patch 0001 I split the existing DecryptStream class in two
> classes (BaseCryptStream, that does key initialization, and DecryptStream,
> that actually decrypts data). I've also rewritten getChar to internally call
> lookChar, so that its logic needn't be duplicated.
> 
> Patch 0002 creates the EncryptStream class and adds support for AES and
> AES-256 encryption (RC4 encrypts using the same algorithm as decryption,
> therefore no new code was needed for RC4).
> 
> --- Patches 0003-0005 -- goo/grandom.cc: Pseudo-random number generation
> Patches 0003-0005 are about pseudo-random number generation: I need random
> numbers to initialize AES and AES-256 cipher block chaining. They needn't be
> unpredictable, because they're stored in plain text, they just needed to be
> different each time.
> 
> Calling srand and rand within a library seems a bad idea, because the
> calling application may have already seeded srand with a known value to get
> a predictable sequence of numbers and, if we called srand or rand again, we
> would affect that sequence. Therefore, patch 0003 adds a goo/grandom.cc
> file to generate random numbers in a safe way when possible. If rand_r is
> available (POSIX), it uses it. Otherwise it relies on unsafe srand/rand
> calls. Other safe methods will have to be developed for other platforms.
> 
> Patch 0004 replaces an existing occurrence of srand+rand calls in
> SplashScreen::buildSCDMatrix with grandom calls. Note that previously rand
> was seeded with a known value, therefore it always gave the same sequence
> of numbers. Now the sequence is different each time, and I don't know if
> this is an issue.
> 
> Patch 0005 uses grandom to initialize AES encryption.
> 
> --- Patch 0006 -- FlateStream::unfilteredReset fix
> Patch 0006 fixes FlateStream::unfilteredReset so that is actually calls
> unfilteredReset on the base stream (instead of reset), so that
> FlateStream::getUnfilteredChar (inherited from FilterStream) can work
> properly. This is the same behavior as all other FilterStream-derived
> classes (see 9b72ee4e4c8658b2f7cd542d601a5c3be621d3fc).
> 
> --- Patch 0007 -- Be able to write objUint back
> Patch 0007 makes sure that overflown integers can be written. This is
> especially useful in full rewrite mode for some values of /P in the
> encryption dictionary.
> 
> --- Patches 0008-0010 -- XRefEntry flags and preserving encryption info
> Some objects, notably the /Encrypt dictionary, are stored in unecrypted
> form. However, if encryption is enabled, we currently pass all objects
> through decryption routines when fetching them. It hasn't done any harm
> till now because we read the /Encrypt dictionary to initialize decryption,
> when decryption is not enabled yet.
> But now we need to be able to read all objects again at any moment, in order
> to write them back in full rewrites. Therefore I've added a Unencrypted
> flag in XRefEntry to mark such unencrypted objects. In detail:
>  - Since there was already a "bool updated" field, I've added a "int flags"
>    field to store both flags (patch 0008).
>  - Patch 0009 adds a XRef::setSpecialFlags() method that sets the
> Unencrypted flags (it only needs to be called once in the lifetime of the
> XRef instance). It recursively marks all entries referred from the /Encrypt
> dictionary as unencrypted (in practice I haven't seen any document with
> indirect references in the /Encrypt dict, but this is what I interpreted
> out of the specs).
>    Patch 0009 also copies the /Encrypt field in the new document's trailer
>    dictionary and removes code that refuses to save encrypted documents.
>  - Patch 0010 makes sure that the first ID field is never changed if the
>    document is encrypted, because it is used to calculate the decryption
> key. Previously, we used to always generate a new ID in full rewrite mode.
> 
> Note about XRef::setSpecialFlags(): I've chosen to add a method to
> explicitly set flags, instead of automatically setting them as soon as xref
> entries are parsed, because in next patches I add checks that are expensive
> (eg patch 0016 requires a full xref traversal, which would defeat
> linearization if done automatically).
> 
> --- Patches 0011-0014 -- PDFDoc::write* methods
> In encrypted PDFs, strings and streams are encrypted. The key varies
> according to the object number.
> 
> Patch 0011 is a refactoring to separate the operations to write objects'
> header (eg "11 5 obj") and footer ("endobj") from the operations to write
> the object itself. This is only to make code introduced by 0012 clearer
> (see commit message in 0011 for details).
> 
> Patch 0012 queries the XRef for encryption parameters and propagates them
> recursively to all write methods. Objects flagged as Unencrypted are passed
> null encryption parameters, so that they are written back unencrypted.
> 
> Patch 0013 performs string encryption. Note: I don't see the reason why we
> check if s->hasUnicodeMarker() or not right after the code I changed, as the
> PDF specs do not make such a distinction. However, it should not cause
> troubles with encrypted strings, because the two code paths only differ in
> which characters are escaped, and such escapes are optional.
> Patch 0014 performs stream encryption. Note that only strWeird streams are
> encrypted here, other ones are raw-copied from the original file.
> 
> --- Patches 0015-0016 -- Special handling for XRef streams and ObjStm
> objects Another category of objects in unencrypted form are XRef streams
> (each with its own xref entry). Patch 0015 marks such entries as
> Unencrypted, so that fetch can read them correctly. Note that XRef parsing
> bypasses fetch, therefore it's unaffected by this patch.
> Actually, storing XRef stream objects makes no sense in case of full
> rewrite, because we always create a new XRef table. Therefore, copied XRef
> stream objects from the original document are just a waste of space (and
> also results in currupt objects, see commit message in 0015). Therefore the
> patch also sets a "DontRewrite" flag on those objects and skips them in
> full-rewrite mode.
> 
> Another category of space-waster leaked objects are compressed object
> streams, which are currently copied in fully-rewritten documents, even
> though the objects they contain are individually written too. Patch 0016
> sets DontRewrite on ObjStm objects too.
> 
> --- Patches 0017-0018 -- Test tools
> The last two patches change utils/pdfinfo and test/pdf-fullrewrite. They
> provide extra features I found useful while developing the patches, and that
> I think can be useful to identify encrypted documents and do automated
> tests on them.
> 
> utils/pdfinfo is modified to show the encryption algorithm (patch 0017).
> 
> test/pdf-fullrewrite is extended (patch 0018):
>  - Support encrypted documents (via -upw/-opw command-line arguments)
>  - Use incremental update mode instead of full-rewrite (-i switch)
>    No new objects are written in this case, but a new trailer is appended,
>    which is enough to test if the XRef chain is extendend properly (eg if
>    /Prev works) and if the new trailer dictionary works (eg can be used spot
> /Encrypt dictionary issues, or changed /ID in encrypted documents) - Added
> support to optionally verify the generated document, by opening it back and
> comparing each object in the xref with the original document's one (-check
> switch).
> 
> I'm sending the patches in a single big file suitable for git am.
> I'm also attaching the shell scripts (relying on these modified tools) that
> I used in my tests. The "unpack" file contains the list of the document
> I've run the tests on.
> 
> Of course, feedback and other test cases are more than welcome.

I've done some regression testing and some code looking and everything looks 
great. If noone has any other opinion i'll commit them to master tomorrow 
evening.

Cheers,
  Albert

> 
> Thanks in advance,
> Fabio


More information about the poppler mailing list