[poppler] How to specify format of output numbering?

Abigaile Johannesburg abij at tuta.io
Sun Jan 13 02:15:06 UTC 2019


Dear mpsuzuki,

Thanks for your clarification. I am using pdfimages mainly for processing scanned books. I don't think I will have to process books with more than 3,333 pages (that will render more than 10,000 files in the worst case). Therefore %04d is enough for my current use, however do I have to modify the code and recompile pdfimages myself? 

But in the long run, I think it is better to have an option for a user to specify the numbering format in the output sequence.

Thanks,
Abi

Jan 11, 2019, 12:52 PM by mpsuzuki at hiroshima-u.ac.jp:

> Dear Abigaile,
>
>> does that mean there is a way to specify numbering format already?
>>
>
> No. What I meant was...
>
> * If there is any existing parser for the user-defined numbering format (out of
> pdfimages, but in poppler), it would be possible for somebody to write a patch.
> * But if there is no, the discussion about the syntax would be needed for first.
>
> Or, "if the total number of the images exceed 1000, the numbering should be
> %04d, we do not need the interface to specify the numbering format" would be
> another solution. how do you think about?
>
> Regards,
> mpsuzuki
>
> Abigaile Johannesburg wrote:
>
>> Dear mpsuzuki,
>>
>> Thank you for quoting the source file regarding numbering scheme.  When you say
>>
>> "good syntax to specify numbering format, if possible, which is already used by poppler'suser interfaces."
>>
>> does that mean there is a way to specify numbering format already?
>>
>> Thanks,
>> Abi
>>
>> Jan 10, 2019, 12:49 AM by >> mpsuzuki at hiroshima-u.ac.jp <mailto:mpsuzuki at hiroshima-u.ac.jp>>> :
>> Dear Abigaile,
>>
>> At present, 3-digit-numbering is hardwired, like, this
>>
>> https://gitlab.freedesktop.org/poppler/poppler/blob/master/utils/ImageOutputDev.cc#L83 <https://gitlab.freedesktop.org/poppler/poppler/blob/master/utils/ImageOutputDev.cc#L83>>> <>> https://apac01.safelinks.protection.outlook.com/?url=https://gitlab.freedesktop.org/poppler/poppler/blob/master/utils/ImageOutputDev.cc#L83&data=02|01|mpsuzuki@hiroshima-u.ac.jp|b9caafd0af1d488d21bb08d677e4edd8|c40454ddb2634926868d8e12640d3750|1|0|636828222476265275&sdata=0/kYtZimHm+jmzXnamD/nyplO83WOZr4e5BqoHyn4f0=&reserved=0 <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fpoppler%2Fpoppler%2Fblob%2Fmaster%2Futils%2FImageOutputDev.cc%23L83&data=02%7C01%7Cmpsuzuki%40hiroshima-u.ac.jp%7Cb9caafd0af1d488d21bb08d677e4edd8%7Cc40454ddb2634926868d8e12640d3750%7C1%7C0%7C636828222476265275&sdata=0%2FkYtZimHm%2BjmzXnamD%2FnyplO83WOZr4e5BqoHyn4f0%3D&reserved=0>>> >
>>
>> void ImageOutputDev::setFilename(const char *fileExt) {
>> if (pageNames) {
>> sprintf(fileName, "%s-%03d-%03d.%s", fileRoot, pageNum, imgNum, fileExt);
>> } else {
>> sprintf(fileName, "%s-%03d.%s", fileRoot, imgNum, fileExt);
>> }
>> }
>>
>> I want to know whether good syntax to specify numbering
>> format, if possible, which is already used by poppler's
>> user interfaces.
>>
>> Regards,
>> mpsuzuki
>>
>> Abigaile Johannesburg wrote:
>> Hello,
>>
>> The default output numbering of pdfimages is 3 digit, e.g, image-root-nnn.xxx. But if there are more than 1,000 ouput images, there will be files image-root-nnn.xxx (3 digit number sequence) and image-root-nnnn.xxx (4 digit number sequence). When processing book images in bash, the ordering needs a fix. At the moment I use rename
>>
>> rename 's/img-([0-9]{3}).pbm/img-0$1.pbm/' *.pbm
>>
>> Therefore I was wondering if there is a way to specify the format of output numbering directly in pdfimages.
>>
>> Thanks,
>> Abi
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20190113/7b916f21/attachment.html>


More information about the poppler mailing list