[poppler] [PATCHES] Support to save encrypted files with PDFDoc::saveAs
Fabio D'Urso
fabiodurso at hotmail.it
Mon Aug 20 13:32:16 PDT 2012
Hi,
the attached patches implement support to save encrypted files, both in
incremental and full rewrite mode. The new encryption features do not use
external libraries and share most code with existing decryption routines.
I've also modified pdf-fullrewrite so that it can deal with password-protected
files, do incremental updates instead of a full rewrite, and verify the
generated file by comparing each object in the new document with the original
one. This allows for automated testing -- more details below.
Note that the patches do *not* implement support for encrypted files in utils
tools such as pdfunite or pdfseparate.
--- Patches 0001-0002 -- Decrypt.cc: Added support for encryption
The first two patches affect Decrypt.cc/h, and create the EncryptStream
class, which is a filter stream that gives encrypted data in output.
In detail, in patch 0001 I split the existing DecryptStream class in two
classes (BaseCryptStream, that does key initialization, and DecryptStream,
that actually decrypts data). I've also rewritten getChar to internally call
lookChar, so that its logic needn't be duplicated.
Patch 0002 creates the EncryptStream class and adds support for AES and
AES-256 encryption (RC4 encrypts using the same algorithm as decryption,
therefore no new code was needed for RC4).
--- Patches 0003-0005 -- goo/grandom.cc: Pseudo-random number generation
Patches 0003-0005 are about pseudo-random number generation: I need random
numbers to initialize AES and AES-256 cipher block chaining. They needn't be
unpredictable, because they're stored in plain text, they just needed to be
different each time.
Calling srand and rand within a library seems a bad idea, because the
calling application may have already seeded srand with a known value to get a
predictable sequence of numbers and, if we called srand or rand again, we
would affect that sequence. Therefore, patch 0003 adds a goo/grandom.cc file
to generate random numbers in a safe way when possible. If rand_r is available
(POSIX), it uses it. Otherwise it relies on unsafe srand/rand calls.
Other safe methods will have to be developed for other platforms.
Patch 0004 replaces an existing occurrence of srand+rand calls in
SplashScreen::buildSCDMatrix with grandom calls. Note that previously rand was
seeded with a known value, therefore it always gave the same sequence of
numbers. Now the sequence is different each time, and I don't know if this is
an issue.
Patch 0005 uses grandom to initialize AES encryption.
--- Patch 0006 -- FlateStream::unfilteredReset fix
Patch 0006 fixes FlateStream::unfilteredReset so that is actually calls
unfilteredReset on the base stream (instead of reset), so that
FlateStream::getUnfilteredChar (inherited from FilterStream) can work
properly. This is the same behavior as all other FilterStream-derived
classes (see 9b72ee4e4c8658b2f7cd542d601a5c3be621d3fc).
--- Patch 0007 -- Be able to write objUint back
Patch 0007 makes sure that overflown integers can be written. This is
especially useful in full rewrite mode for some values of /P in the encryption
dictionary.
--- Patches 0008-0010 -- XRefEntry flags and preserving encryption info
Some objects, notably the /Encrypt dictionary, are stored in unecrypted form.
However, if encryption is enabled, we currently pass all objects through
decryption routines when fetching them. It hasn't done any harm till now
because we read the /Encrypt dictionary to initialize decryption, when
decryption is not enabled yet.
But now we need to be able to read all objects again at any moment, in order
to write them back in full rewrites. Therefore I've added a Unencrypted flag
in XRefEntry to mark such unencrypted objects. In detail:
- Since there was already a "bool updated" field, I've added a "int flags"
field to store both flags (patch 0008).
- Patch 0009 adds a XRef::setSpecialFlags() method that sets the Unencrypted
flags (it only needs to be called once in the lifetime of the XRef
instance). It recursively marks all entries referred from the /Encrypt
dictionary as unencrypted (in practice I haven't seen any document with
indirect references in the /Encrypt dict, but this is what I interpreted
out of the specs).
Patch 0009 also copies the /Encrypt field in the new document's trailer
dictionary and removes code that refuses to save encrypted documents.
- Patch 0010 makes sure that the first ID field is never changed if the
document is encrypted, because it is used to calculate the decryption key.
Previously, we used to always generate a new ID in full rewrite mode.
Note about XRef::setSpecialFlags(): I've chosen to add a method to explicitly
set flags, instead of automatically setting them as soon as xref entries are
parsed, because in next patches I add checks that are expensive (eg patch 0016
requires a full xref traversal, which would defeat linearization if done
automatically).
--- Patches 0011-0014 -- PDFDoc::write* methods
In encrypted PDFs, strings and streams are encrypted. The key varies according
to the object number.
Patch 0011 is a refactoring to separate the operations to write objects'
header (eg "11 5 obj") and footer ("endobj") from the operations to write the
object itself. This is only to make code introduced by 0012 clearer (see
commit message in 0011 for details).
Patch 0012 queries the XRef for encryption parameters and propagates them
recursively to all write methods. Objects flagged as Unencrypted are passed
null encryption parameters, so that they are written back unencrypted.
Patch 0013 performs string encryption. Note: I don't see the reason why we
check if s->hasUnicodeMarker() or not right after the code I changed, as the
PDF specs do not make such a distinction. However, it should not cause
troubles with encrypted strings, because the two code paths only differ in
which characters are escaped, and such escapes are optional.
Patch 0014 performs stream encryption. Note that only strWeird streams are
encrypted here, other ones are raw-copied from the original file.
--- Patches 0015-0016 -- Special handling for XRef streams and ObjStm objects
Another category of objects in unencrypted form are XRef streams (each with
its own xref entry). Patch 0015 marks such entries as Unencrypted, so that
fetch can read them correctly. Note that XRef parsing bypasses fetch,
therefore it's unaffected by this patch.
Actually, storing XRef stream objects makes no sense in case of full rewrite,
because we always create a new XRef table. Therefore, copied XRef stream
objects from the original document are just a waste of space (and also results
in currupt objects, see commit message in 0015). Therefore the patch also sets
a "DontRewrite" flag on those objects and skips them in full-rewrite mode.
Another category of space-waster leaked objects are compressed object streams,
which are currently copied in fully-rewritten documents, even though the
objects they contain are individually written too. Patch 0016 sets DontRewrite
on ObjStm objects too.
--- Patches 0017-0018 -- Test tools
The last two patches change utils/pdfinfo and test/pdf-fullrewrite. They
provide extra features I found useful while developing the patches, and that I
think can be useful to identify encrypted documents and do automated tests on
them.
utils/pdfinfo is modified to show the encryption algorithm (patch 0017).
test/pdf-fullrewrite is extended (patch 0018):
- Support encrypted documents (via -upw/-opw command-line arguments)
- Use incremental update mode instead of full-rewrite (-i switch)
No new objects are written in this case, but a new trailer is appended,
which is enough to test if the XRef chain is extendend properly (eg if
/Prev works) and if the new trailer dictionary works (eg can be used spot
/Encrypt dictionary issues, or changed /ID in encrypted documents)
- Added support to optionally verify the generated document, by opening it
back and comparing each object in the xref with the original document's one
(-check switch).
I'm sending the patches in a single big file suitable for git am.
I'm also attaching the shell scripts (relying on these modified tools) that I
used in my tests. The "unpack" file contains the list of the document I've run
the tests on.
Of course, feedback and other test cases are more than welcome.
Thanks in advance,
Fabio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: everything.patch
Type: text/x-patch
Size: 112101 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120820/58ee1bdd/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests.tar.gz
Type: application/x-compressed-tar
Size: 1873 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120820/58ee1bdd/attachment-0003.bin>
More information about the poppler
mailing list