[poppler] How to "pdfunite" in memory...?

Pierre Couderc pierre at couderc.eu
Sun Jul 18 06:03:21 UTC 2021


Thank you very much !

This is what I expected : I had not search enough to find 
PDFDoc(BaseStream *strA...

I start on that..



On 7/18/21 3:44 AM, William Bader wrote:
> utils/pdfunite.cc opens its input files with PDFDoc *doc = new 
> PDFDoc(gfileName, NULL, NULL, NULL)
> poppler/PDFDoc.h also provides PDFDoc(BaseStream *strA, GooString 
> *ownerPassword = NULL, GooString *userPassword = NULL, void *guiDataA 
> = NULL)
> poppler/Stream.h provides MemStream(char *bufA, Goffset startA, 
> Goffset lengthA, Object &&dictA) that you could probably use 
> like MemStream *mStream = new MemStream(s->getCString(), 0, 
> s->getLength(), Object(objNull))
> So if you are lucky, you can make a MemStream for each in-memory PDF, 
> then make a PDFDoc for each MemStream, and then cut-and-paste the code 
> in pdfunite.cc that combines the PDFDoc objects.
> Running "pdfunite <(cat a.pdf) <(cat b.pdf) ab.pdf" from bash fails 
> with "Syntax Error: Document stream is empty" "Syntax Error: Could not 
> merge damaged documents ('/dev/fd/63')", so PDFDoc might require input 
> that is seekable, so if you are using std::istream, if the underlying 
> data is from a stringstream, it might work, but if it is from an 
> fstream, you might have to read it all into a buffer.
> William
>
> ------------------------------------------------------------------------
> *From:* poppler <poppler-bounces at lists.freedesktop.org> on behalf of 
> Pierre Couderc <pierre at couderc.eu>
> *Sent:* Saturday, July 17, 2021 5:33 PM
> *To:* poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
> *Subject:* Re: [poppler] How to "pdfunite" in memory...?
> On 7/17/21 8:43 PM, Oliver Sander wrote:
> >> I do not understand well your question. But I know that a pdf
> >> document contains pages.
> >>
> >> I have pdf documents in memory (read from a database) and I need to
> >> merge these documents in memory to write them back in a database...
> >
> > You need to give a few more details about what you mean by "I have pdf
> > documents in memory".
> > Does that mean that you simply copied the file content to some
> > allocated memory?  Or have
> > you opened these pdf files using poppler (using code like in
> > poppler/qt5/demos)?
> >
> > You need to do the latter to solve your problem. Open the files using
> > poppler,
> > and then copy code that unites them from pdfunite.cc (licences
> > permitting).
> >
> > Best,
> > Oliver
> >
> Sorry to not be clear : I upload pdf documents (with a c++ cppcms
> server), I get them in some std::istream, I need to manipulate pages of
> these documents, create new  documents from these pages, store these
> documents in bytea postgresql db, extract text from them, retrieve them
> when a user ask to download them...
>
> poppler can make the job, but I do not need and I would like to avoid to
> use files to do all that...
>
> So my question : what is the best strategy ?
>
> I have no license problem, all is open source.
>
> Thank you
>
> PX.
>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler 
> <https://lists.freedesktop.org/mailman/listinfo/poppler>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20210718/6e4185a6/attachment.htm>


More information about the poppler mailing list