[poppler] [PATCHES] Support to save encrypted files with PDFDoc::saveAs

Fabio D'Urso fabiodurso at hotmail.it
Mon Aug 20 13:32:16 PDT 2012


Hi,
the attached patches implement support to save encrypted files, both in 
incremental and full rewrite mode. The new encryption features do not use 
external libraries and share most code with existing decryption routines.

I've also modified pdf-fullrewrite so that it can deal with password-protected 
files, do incremental updates instead of a full rewrite, and verify the 
generated file by comparing each object in the new document with the original 
one. This allows for automated testing -- more details below.

Note that the patches do *not* implement support for encrypted files in utils 
tools such as pdfunite or pdfseparate.

--- Patches 0001-0002 -- Decrypt.cc: Added support for encryption
The first two patches affect Decrypt.cc/h, and create the EncryptStream 
class, which is a filter stream that gives encrypted data in output.

In detail, in patch 0001 I split the existing DecryptStream class in two 
classes (BaseCryptStream, that does key initialization, and DecryptStream, 
that actually decrypts data). I've also rewritten getChar to internally call 
lookChar, so that its logic needn't be duplicated.

Patch 0002 creates the EncryptStream class and adds support for AES and 
AES-256 encryption (RC4 encrypts using the same algorithm as decryption, 
therefore no new code was needed for RC4).

--- Patches 0003-0005 -- goo/grandom.cc: Pseudo-random number generation
Patches 0003-0005 are about pseudo-random number generation: I need random 
numbers to initialize AES and AES-256 cipher block chaining. They needn't be 
unpredictable, because they're stored in plain text, they just needed to be 
different each time.

Calling srand and rand within a library seems a bad idea, because the 
calling application may have already seeded srand with a known value to get a 
predictable sequence of numbers and, if we called srand or rand again, we 
would affect that sequence. Therefore, patch 0003 adds a goo/grandom.cc file 
to generate random numbers in a safe way when possible. If rand_r is available 
(POSIX), it uses it. Otherwise it relies on unsafe srand/rand calls. 
Other safe methods will have to be developed for other platforms.

Patch 0004 replaces an existing occurrence of srand+rand calls in 
SplashScreen::buildSCDMatrix with grandom calls. Note that previously rand was 
seeded with a known value, therefore it always gave the same sequence of 
numbers. Now the sequence is different each time, and I don't know if this is 
an issue.

Patch 0005 uses grandom to initialize AES encryption.

--- Patch 0006 -- FlateStream::unfilteredReset fix
Patch 0006 fixes FlateStream::unfilteredReset so that is actually calls 
unfilteredReset on the base stream (instead of reset), so that 
FlateStream::getUnfilteredChar (inherited from FilterStream) can work 
properly. This is the same behavior as all other FilterStream-derived 
classes (see 9b72ee4e4c8658b2f7cd542d601a5c3be621d3fc).

--- Patch 0007 -- Be able to write objUint back
Patch 0007 makes sure that overflown integers can be written. This is 
especially useful in full rewrite mode for some values of /P in the encryption 
dictionary.

--- Patches 0008-0010 -- XRefEntry flags and preserving encryption info
Some objects, notably the /Encrypt dictionary, are stored in unecrypted form. 
However, if encryption is enabled, we currently pass all objects through 
decryption routines when fetching them. It hasn't done any harm till now 
because we read the /Encrypt dictionary to initialize decryption, when 
decryption is not enabled yet.
But now we need to be able to read all objects again at any moment, in order 
to write them back in full rewrites. Therefore I've added a Unencrypted flag 
in XRefEntry to mark such unencrypted objects. In detail:
 - Since there was already a "bool updated" field, I've added a "int flags"
   field to store both flags (patch 0008).
 - Patch 0009 adds a XRef::setSpecialFlags() method that sets the Unencrypted
   flags (it only needs to be called once in the lifetime of the XRef
   instance). It recursively marks all entries referred from the /Encrypt
   dictionary as unencrypted (in practice I haven't seen any document with
   indirect references in the /Encrypt dict, but this is what I interpreted
   out of the specs).
   Patch 0009 also copies the /Encrypt field in the new document's trailer
   dictionary and removes code that refuses to save encrypted documents.
 - Patch 0010 makes sure that the first ID field is never changed if the
   document is encrypted, because it is used to calculate the decryption key.
   Previously, we used to always generate a new ID in full rewrite mode.

Note about XRef::setSpecialFlags(): I've chosen to add a method to explicitly 
set flags, instead of automatically setting them as soon as xref entries are 
parsed, because in next patches I add checks that are expensive (eg patch 0016 
requires a full xref traversal, which would defeat linearization if done 
automatically).

--- Patches 0011-0014 -- PDFDoc::write* methods
In encrypted PDFs, strings and streams are encrypted. The key varies according 
to the object number.

Patch 0011 is a refactoring to separate the operations to write objects' 
header (eg "11 5 obj") and footer ("endobj") from the operations to write the 
object itself. This is only to make code introduced by 0012 clearer (see 
commit message in 0011 for details).

Patch 0012 queries the XRef for encryption parameters and propagates them 
recursively to all write methods. Objects flagged as Unencrypted are passed 
null encryption parameters, so that they are written back unencrypted.

Patch 0013 performs string encryption. Note: I don't see the reason why we 
check if s->hasUnicodeMarker() or not right after the code I changed, as the 
PDF specs do not make such a distinction. However, it should not cause 
troubles with encrypted strings, because the two code paths only differ in 
which characters are escaped, and such escapes are optional.
Patch 0014 performs stream encryption. Note that only strWeird streams are 
encrypted here, other ones are raw-copied from the original file.

--- Patches 0015-0016 -- Special handling for XRef streams and ObjStm objects
Another category of objects in unencrypted form are XRef streams (each with 
its own xref entry). Patch 0015 marks such entries as Unencrypted, so that 
fetch can read them correctly. Note that XRef parsing bypasses fetch, 
therefore it's unaffected by this patch.
Actually, storing XRef stream objects makes no sense in case of full rewrite, 
because we always create a new XRef table. Therefore, copied XRef stream 
objects from the original document are just a waste of space (and also results 
in currupt objects, see commit message in 0015). Therefore the patch also sets 
a "DontRewrite" flag on those objects and skips them in full-rewrite mode.

Another category of space-waster leaked objects are compressed object streams, 
which are currently copied in fully-rewritten documents, even though the 
objects they contain are individually written too. Patch 0016 sets DontRewrite 
on ObjStm objects too.

--- Patches 0017-0018 -- Test tools
The last two patches change utils/pdfinfo and test/pdf-fullrewrite. They 
provide extra features I found useful while developing the patches, and that I 
think can be useful to identify encrypted documents and do automated tests on 
them.

utils/pdfinfo is modified to show the encryption algorithm (patch 0017).

test/pdf-fullrewrite is extended (patch 0018):
 - Support encrypted documents (via -upw/-opw command-line arguments)
 - Use incremental update mode instead of full-rewrite (-i switch)
   No new objects are written in this case, but a new trailer is appended,
   which is enough to test if the XRef chain is extendend properly (eg if
   /Prev works) and if the new trailer dictionary works (eg can be used spot
   /Encrypt dictionary issues, or changed /ID in encrypted documents)
 - Added support to optionally verify the generated document, by opening it
   back and comparing each object in the xref with the original document's one
   (-check switch).

I'm sending the patches in a single big file suitable for git am.
I'm also attaching the shell scripts (relying on these modified tools) that I 
used in my tests. The "unpack" file contains the list of the document I've run 
the tests on.

Of course, feedback and other test cases are more than welcome.

Thanks in advance,
Fabio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: everything.patch
Type: text/x-patch
Size: 112101 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120820/58ee1bdd/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests.tar.gz
Type: application/x-compressed-tar
Size: 1873 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120820/58ee1bdd/attachment-0003.bin>


More information about the poppler mailing list