[poppler] pdftocairo - Updated Patches
ajohnson at redneon.com
Fri Aug 20 05:15:49 PDT 2010
On 19/08/10 18:30, Stefan Thomas wrote:
> Hey again,
>> Following modification can drop the page number suffix in
>> output filename when "split" flag is disabled. Stefan, could
>> you review?
> Makes sense and works great! Thanks!
> A Dissabte, 31 de juliol de 2010, Adrian Johnson va escriure:
>> I've just done some testing with the patches and found a problem with
>> the generated output file name.
>> My original patch would, if no output file name is specified, use the
>> input filename to generate the output name like what pdftops does. eg
>> "pdftocairo -ps foo.pdf" will create foo.ps
>> "pdftocairo -png foo.pdf" will create foo-001.png, foo-002.png, ...
>> the updated patches are doing:
>> "pdftocairo -ps foo.pdf" will create cairoout-001.ps
>> "pdftocairo -png foo.pdf" will create cairoout-001.png,
>> cairoout-002.png, ...
>> which is not very user friendly.
> The reason for this change was that the previous way of determining
> outRoot gave unwanted results when you gave it a remote PDF like:
> "pdftocairo -ps http://m.je/test.pdf" will try to create
> Didn't like URLs without a real filename either:
> "pdftocairo -ps http://example.com/get?format=pdf&asset=3948" will try
> to create http://example.ps
> Another problem was that it would create output files in the directory
> where the PDF file resides rather than in the current working directory
> which is unusual for *NIX command line tools.
> "pdftocairo -ps /media/cdrom0/pdfs/mytest.pdf" -> write error
> The current version (with mpsuzuki's fix) will create "cairoout.ps" in
> the current working directory in all of those cases. Plus you can always
> provide a second parameter if you'd like a different name. I'm liking
> the predictability of it.
> As I see it we have three options:
> 1. Current way: Always use "cairoout", unless otherwise specified by the
> 2. Adrian's way: Create outRoot from input filename. Perhaps with some
> improvements like cutting off everything before the last slash (if it
> exists), then everything after the first question mark (if it exists),
> then everything after the last dot (if it exists). If the result is
> empty or otherwise not a valid filename, use "cairoout".
> 3. pdftoppm's way: Output to STDOUT unless a second parameter with an
> output filename is provided.
> I'd be happy with any of these. I'm liking about number one that I know
> there aren't any bugs in it. With number 2 I feel like there is always
> going to be a URL or filename that breaks it/acts weird. I'm liking
> about number three that it's consistent with pdftoppm.
My preference is to maintain consistency with the other poppler utils.
pdftops, pdftotext, and pdftohtml all use the source file with the
extension changed. I prefer pdftocairo to work the same way.
pdftoppm is the exception in writing to stdout if no output name is
specified. This only makes sense for the ppm format which allows
multiple images concatenated together. Writing multiple images to
stdout does not work with the png or jpeg formats.
Currently none of the other poppler utils (except pdftoppm which writes
to stdout) handle URLs with no output file specified. Whatever solution
is chosen for providing default filenames for URLs should be
consistently implemented across all the poppler utils. I would suggest
implementing behavior similar to wget. ie strip off everything up to
the last slash and escape any characters not supported by the
filesystem. Since the other poppler utils do not work with URLs with no
output file specified, I don't think finding a solution for this case
should block the committing of pdftocairo.
> I'll make a final patch once we've decided the filename issue.
> poppler mailing list
> poppler at lists.freedesktop.org
More information about the poppler