[poppler] PDF saving support

Julien Rebetez julien at fhtagn.net
Mon Dec 31 08:24:24 PST 2007


Hi,

Sorry for the long response delay, I've had no time to work on Poppler 
lately.
I made some modifications to the patches to fix some of the problems
and to make them apply against latest poppler's HEAD.
Also, in the first mail, I forgot to include the first patch.


Jeff Muizelaar wrote:
>
> Out of curiosity, how hard would it be to OS X Leopard style pdf
> modifications (reorder and omit pages, merge pdf's) on top of this code?
>   
 From what I know, to reorder pages, you just need to make some changes in
the page tree. As the page tree nodes are just PDF object, they can be
updated and saved with this patch as any other object. So this shouldn't be
too difficult I think.

I'm not sure what you mean by merging, but I think you want to take the 
pages
from 2 documents and combine them in only 1 document.
The problem that will arise here is that some objects will have the same 
numbers.
A solution would be to open the first PDF, modify its objects numbers so
they are no free objects in the range [0,last object]. Then, open the 
second PDF,
modify the objects numbers so they use the range [last object,65535]. 
After that,
we can start to move the pages around from the second PDF in the first one.
Well, I think we might encounter some other problems, but we could give 
it a try.

> Yep, this works well. Also, if you haven't tried it out yet, stgit is
> very good at managing and modifying patch sets like this. I'd recommend
> giving it a try.
>   
Thanks for the info.

>> >From 08391ecc731dc904f42b4566841f4dbae4bbd4c2 Mon Sep 17 00:00:00 2001
>> From: Julien Rebetez <julien at fhtagn.net>
>> Date: Thu, 25 Oct 2007 22:32:43 +0300
>> Subject: [PATCH] Adds the ability to save PDF using either incremental update or by
>> rewriting completly the PDF.
>> ---
>>  poppler/PDFDoc.cc |  382 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  poppler/PDFDoc.h  |   18 +++-
>>  2 files changed, 394 insertions(+), 6 deletions(-)
>>
>> diff --git a/poppler/PDFDoc.cc b/poppler/PDFDoc.cc
>> index 78fbea2..58cb81a 100644
>> --- a/poppler/PDFDoc.cc
>> +++ b/poppler/PDFDoc.cc
>> @@ -34,6 +34,7 @@
>>  #include "Lexer.h"
>>  #include "Parser.h"
>>  #include "SecurityHandler.h"
>> +#include "Decrypt.h"
>>  #ifndef DISABLE_OUTLINE
>>  #include "Outline.h"
>>  #endif
>> @@ -435,19 +436,390 @@ GBool PDFDoc::isLinearized() {
>>    return lin;
>>  }
>>  
>> -GBool PDFDoc::saveAs(GooString *name) {
>> +GBool PDFDoc::saveAs(GooString *name, PDFWriteMode mode) {
>>    FILE *f;
>> -  int c;
>> +  OutStream *outStr;
>>  
>>    if (!(f = fopen(name->getCString(), "wb"))) {
>>      error(-1, "Couldn't open file '%s'", name->getCString());
>>      return gFalse;
>>    }
>> +  outStr = new FileOutStream(f,0);
>> +
>> +  if (mode == writeForceRewrite) {
>> +    saveCompleteRewrite(outStr);
>> +  } else if (mode == writeForceIncremental) {
>> +    saveIncrementalUpdate(outStr); 
>> +  } else { // let poppler decide
>> +    // find if we have updated objects
>> +    GBool updated = gFalse;
>> +    for(int i=0; i<xref->getNumObjects(); i++) {
>> +      // We don't take null and none objects into account
>>     
> Why?
>   
Mmmh, this is a bit cryptic.It's because the 'obj' field
of an XRefEntry is set to something else than none or null only
if this entry has been updated. But I added a field 'updated' to
XRefEntry so it's more obvious :-)


>   
> How well does PDFDoc::saveCompleteRewrite work? i.e. given an arbitrary
> pdf, what are the chances that it won't produce something good?
>   
I don't really know, it's working fine with the PDF I have, but it probably
needs some more testing. At the moment, the rewritten PDF is much more
bigger than the original (for PDFReference16, the original is 9mo, the 
rewritten
is 21mo).
But, saveCompleteRewrite is not really needed to save informations 
filled in forms
or Annots (incremental update is enough), it's much like a "bonus" feature.


>> +void PDFDoc::writeTrailer (Guint uxrefOffset, int uxrefSize, OutStream* outStr, GBool incrUpdate)
>> +{
>> +  Dict *trailerDict = new Dict(xref);
>> +  Object obj1;
>> +  obj1.initInt(uxrefSize);
>> +  trailerDict->set("Size", &obj1);
>> +  obj1.free();
>> +
>> +
>> +  //build a new ID, as recommended in the reference, uses:
>> +  // - current time
>> +  // - file name
>> +  // - file size
>> +  // - values of entry in information dictionnary
>> +  GooString message;
>> +  char buffer[256];
>> +  sprintf(buffer, "%i", (int)time(NULL));
>>     
>
> I don't really like the idea of using time here becase it makes things
> non-deterministic. Some people also consider it a security leak. Could
> we use an md5 hash or something instead?
>
>   
A md5 hash is used after and it includes time, file size and
the entries in the information dictionnary of the file. This is what is
recommended by the reference (section 10.3 File Identifiers).
>> >From b9e7214c516a9f2abb25a03493301ae0ece2006e Mon Sep 17 00:00:00 2001
>> From: Julien Rebetez <julien at fhtagn.net>
>> Date: Fri, 26 Oct 2007 16:40:11 +0300
>> Subject: [PATCH] Fix memory management problem with appearBuf in Annot.
>>
>> ---
>>  poppler/Annot.cc |    6 ++----
>>  1 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/poppler/Annot.cc b/poppler/Annot.cc
>> index a1bb227..b84fd87 100644
>> --- a/poppler/Annot.cc
>> +++ b/poppler/Annot.cc
>> @@ -332,9 +332,6 @@ Annot::~Annot() {
>>      delete type;
>>    }
>>    appearance.free();
>> -  if (appearBuf) {
>> -    delete appearBuf;
>> -  }
>>  
>>    if (borderStyle) {
>>      delete borderStyle;
>> @@ -709,11 +706,12 @@ void Annot::generateFieldAppearance(Dict *field, Dict *annot, Dict *acroForm) {
>>    drObj.free();
>>  
>>    // build the appearance stream
>> -  appearStream = new MemStream(appearBuf->getCString(), 0,
>> +  appearStream = new MemStream(strdup(appearBuf->getCString()), 0,
>>        appearBuf->getLength(), &appearDict);
>>     
>
> I believe this will leak the strdupped copy of appearBuf->getCString()
> because MemStream won't free it.
>   
It will free it because I added the setNeedFree() method to MemStream
and appearStream->setNeedFree(gTrue) is called just below.

Regards and Happy new year,
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01-out_stream.patch
Type: text/x-patch
Size: 3439 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0012.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 02-set_need_free.patch
Type: text/x-patch
Size: 679 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0013.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 03-unfiltered_stream.patch
Type: text/x-patch
Size: 4601 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0014.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 04-add_indirect_object.patch
Type: text/x-patch
Size: 2123 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0015.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 05-write_to_file.patch
Type: text/x-patch
Size: 1980 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0016.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 06-md5_public.patch
Type: text/x-patch
Size: 1726 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0017.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 07-save_pdf.patch
Type: text/x-patch
Size: 15856 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0018.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 08-dict_copy_constr.patch
Type: text/x-patch
Size: 1137 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0019.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 09-annot_appearance_save.patch
Type: text/x-patch
Size: 3054 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0020.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 10-fix_annot_border_bug.patch
Type: text/x-patch
Size: 696 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0021.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 11-formwidget_modified_var.patch
Type: text/x-patch
Size: 2012 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0022.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 12-full_rewrite_test_app.patch
Type: text/x-patch
Size: 2255 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0023.bin 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: series
Url: http://lists.freedesktop.org/archives/poppler/attachments/20071231/6a35d260/attachment-0001.txt 


More information about the poppler mailing list