[poppler] [RFC] PDF Modification in poppler
leonardr at pdfsages.com
Fri Aug 18 06:04:58 PDT 2006
At 08:40 PM 8/17/2006, Julien Rebetez wrote:
>- It adds two functions to PDFDoc : saveCompleteRewrite and
>saveIncrementalUpdate. These functions allow the client to save the
>modification he does (through setModifiedObject) either by rewriting the
>whole document or using incremental update (this may be required for
>digitally signed document for example).
Very nice - offering both methods of writing. And I also
like the "dynamic checking" option where Xpdf will pick the best option.
>There are a bunch of other internal functions added, mostly in PDFDoc,
>but I think they are pretty self-explanatory.
A couple of things I saw in a cursory view...
1) You write all strings as Hex - that makes the file larger. You
shouldn't need to do that.
2) The Save code all assumes that you are writing to a file. Since
reading supports reading from a "generic stream" - it would be
worthwhile to do the same for writing. That would provide more
3) You might find that floating point values get written out with
more decimal places than you'd like - so consider limiting to 3 or 4
(or a user-setting).
4) You decompress ALL streams and write them out
uncompressed. OUCH! I would take the approach that if the stream
object is untouched, just copy the bytes "raw" from the original file
to the new one. Faster (no decompression!) and smaller output files.
5) I didn't look at an output file, but I think you are outputting
MUCH MORE whitespace than necessary.
6) I don't see you updating the ID for either write. For
incremental, you update only the second value, for a full, you need a
completely new ID.
>One things that isn't implemented at the moment is the update of direct
>Objects. For example, the Annotation may be direct Objects (directly
>contained in the Page dict "Annots" entry). If the client updates a
>direct Annotation, the whole first 'indirect-parent' Object (probably
>the Page dict in our example) must be updated through setModifiedObject.
>This is, at least, the only solution I see for direct Objects update,
>but perhaps other people have other ideas.
Either that or convert the direct object to an indirect...
Leonard Rosenthol <mailto:leonardr at pdfsages.com>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
More information about the poppler