[poppler] Trying to extend pdftoppm to use CairoOutputDev

Stefan Thomas thomas at eload24.com
Wed Dec 16 16:26:15 PST 2009


Hey,

If I understand this correctly you could use it to get any clipping area 
of any pdf as a new pdf. And it would only include the objects that 
actually intersect that area, so the filesize would be near optimal.

You could do some cool stuff with this, for example to get individual 
pages from PDFs that have multiple pages concatenated together 
side-by-side as PDFs of flyers and leaflets often do. Or to extract 
vector artwork from a PDF without worrying about it not looking the same 
outside of the context of the PDF.

My suggestion would be to split pdftoppm into pdftoimage (using 
SplashOutputDev) for JPEG, PNG and PPM output and pdftovector (using 
CairoOutputDev) for PDF and some day PS and SVG output.

Cheers,

Stefan


Albert Astals Cid wrote:
> A Dimarts 15 Desembre 2009 02:28:54, mpsuzuki at hiroshima-u.ac.jp va escriure:
>   
>> Dear Poppler developers,
>>     
>
> Hi
>
>   
>> Before all, I thank poppler developers for writing excellent
>> software. The addition of CairoOutputDev is very interesting.
>>
>> Now I'm trying to extend pdftoppm to draw on CairoOutputDev.
>> My motivation is splitting a large table in PDF document
>> into small PDFs for each cell.
>>
>> Recent poppler has a feature to draw on cairo surface, so
>> I think it is possible to do such by pdftoppm draw a cell
>> (by the specification of geometry for a cell) on cairo surface,
>> something like:
>>
>>   pdftoppm \
>>            -f [page_num] -l [page_num] \
>>            -r [dpi_to_specify_the_unit_of_geometry] \
>>            -x [cell_pos_x] -y [cell_pos_y]          \
>>            -w [cell_width] -h [cell_height]         \
>>            -pdf [input_table.pdf] [output_cell_prefix]
>>
>> Attached patch is an experiment doing such, please comment
>> what should be improved for the official adoption.
>>
>> By default, "-r" option for "pdftoppm -pdf" is used only
>> as an unit to calculate the geometry to be cropped, and
>> it does not change the resolution of output PDF. This is
>> inconsistent with "-r" option for SplashOutputDev cases.
>> If MODIFY_RESOLUTION_IN_PDF2CAIRO is defined in the compilation,
>> the behaviour of "-r" is consistent with the case of
>> SplashOutputDev.
>>
>> The problems that I've already recognized are:
>>
>> * If a PDF including large image (e.g. PDF generated by
>>   image scanners) is given, the cropped PDF includes
>>   whole image object, not cropped image object.
>>   The filesize of cropped PDF is not reduced.
>>
>> * When multiple pages are rendered (e.g. pdftoppm -pdf
>>   -f 1 -l 100 ...), startDoc() is invoked for each
>>   output file. As a result, the rendering speed is
>>   slower than that of SplashOutputDev.
>>     
>
> I'm not really sure the use case is really that useful for the general public 
> to really include this in poppler utils, anyway there is a lot of cairo 
> specific code that needs to be properly ifdefed because cairo is an optional 
> dependency.
>
> Also i think this belongs more into a separate binary than in the pdftoppm 
> one. We already "bastardized" it adding png and jpeg, maybe we should rename 
> it to pdfconvert or something like that.
>
> Anyway more than in the code itself i'm concerned in the utility, do you (not 
> only mpsuzuki i'm interested in everyone's opinion) think anyone ever will 
> have the need of splitting a PDF file in chunks and will think of using 
> pdftoppm?
>
> Albert
>
>   
>> Regards,
>> mpsuzuki
>>
>>     
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>   



More information about the poppler mailing list