[poppler] [RFC] PDF Modification in poppler

Leonard Rosenthol leonardr at pdfsages.com
Fri Aug 18 06:04:58 PDT 2006


At 08:40 PM 8/17/2006, Julien Rebetez wrote:
>- It adds two functions to PDFDoc : saveCompleteRewrite and
>saveIncrementalUpdate. These functions allow the client to save the
>modification he does (through setModifiedObject) either by rewriting the
>whole document or using incremental update (this may be required for
>digitally signed document for example).

         Very nice - offering both methods of writing.   And I also 
like the "dynamic checking" option where Xpdf will pick the best option.


>There are a bunch of other internal functions added, mostly in PDFDoc,
>but I think they are pretty self-explanatory.

         A couple of things I saw in a cursory view...

1) You write all strings as Hex - that makes the file larger.  You 
shouldn't need to do that.
2) The Save code all assumes that you are writing to a file.  Since 
reading supports reading from a "generic stream" - it would be 
worthwhile to do the same for writing.  That would provide more 
future expansion/flexibility.
3) You might find that floating point values get written out with 
more decimal places than you'd like - so consider limiting to 3 or 4 
(or a user-setting).
4) You decompress ALL streams and write them out 
uncompressed.  OUCH!   I would take the approach that if the stream 
object is untouched, just copy the bytes "raw" from the original file 
to the new one.  Faster (no decompression!) and smaller output files.
5) I didn't look at an output file, but I think you are outputting 
MUCH MORE whitespace than necessary.
6) I don't see you updating the ID for either write.  For 
incremental, you update only the second value, for a full, you need a 
completely new ID.


>One things that isn't implemented at the moment is the update of direct
>Objects. For example, the Annotation may be direct Objects (directly
>contained in the Page dict "Annots" entry). If the client updates a
>direct Annotation, the whole first 'indirect-parent' Object (probably
>the Page dict in our example) must be updated through setModifiedObject.
>This is, at least, the only solution I see for direct Objects update,
>but perhaps other people have other ideas.

         Either that or convert the direct object to an indirect...


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:leonardr at pdfsages.com>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                              215-938-0880 (fax)



More information about the poppler mailing list