[poppler] How to "pdfunite" in memory...?

Sun Jul 18 01:44:36 UTC 2021

utils/pdfunite.cc opens its input files with PDFDoc *doc = new PDFDoc(gfileName, NULL, NULL, NULL)
poppler/PDFDoc.h also provides PDFDoc(BaseStream *strA, GooString *ownerPassword = NULL, GooString *userPassword = NULL, void *guiDataA = NULL)
poppler/Stream.h provides MemStream(char *bufA, Goffset startA, Goffset lengthA, Object &&dictA) that you could probably use like MemStream *mStream = new MemStream(s->getCString(), 0, s->getLength(), Object(objNull))
So if you are lucky, you can make a MemStream for each in-memory PDF, then make a PDFDoc for each MemStream, and then cut-and-paste the code in pdfunite.cc that combines the PDFDoc objects.
Running "pdfunite <(cat a.pdf) <(cat b.pdf) ab.pdf" from bash fails with "Syntax Error: Document stream is empty" "Syntax Error: Could not merge damaged documents ('/dev/fd/63')", so PDFDoc might require input that is seekable, so if you are using std::istream, if the underlying data is from a stringstream, it might work, but if it is from an fstream, you might have to read it all into a buffer.
William

________________________________
From: poppler <poppler-bounces at lists.freedesktop.org> on behalf of Pierre Couderc <pierre at couderc.eu>
Sent: Saturday, July 17, 2021 5:33 PM
To: poppler at lists.freedesktop.org <poppler at lists.freedesktop.org>
Subject: Re: [poppler] How to "pdfunite" in memory...?

On 7/17/21 8:43 PM, Oliver Sander wrote:
>> I do not understand well your question. But I know that a pdf
>> document contains pages.
>>
>> I have pdf documents in memory (read from a database) and I need to
>> merge these documents in memory to write them back in a database...
>
> You need to give a few more details about what you mean by "I have pdf
> documents in memory".
> Does that mean that you simply copied the file content to some
> allocated memory?  Or have
> you opened these pdf files using poppler (using code like in
> poppler/qt5/demos)?
>
> You need to do the latter to solve your problem.  Open the files using
> poppler,
> and then copy code that unites them from pdfunite.cc (licences
> permitting).
>
> Best,
> Oliver
>
Sorry to not be clear : I upload pdf documents (with a c++ cppcms
server), I get them in some std::istream, I need to manipulate pages of
these documents, create new  documents from these pages, store these
documents in bytea postgresql db, extract text from them, retrieve them
when a user ask to download them...

poppler can make the job, but I do not need and I would like to avoid to
use files to do all that...

So my question : what is the best strategy ?

I have no license problem, all is open source.

Thank you

PX.

_______________________________________________
poppler mailing list
poppler at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20210718/e357d9eb/attachment-0001.htm>