[poppler] pdftohtml: add version of poppler to the XML output

Ihar `Philips` Filipau thephilips at gmail.com
Fri Apr 6 19:51:58 PDT 2012


On 4/6/12, Albert Astals Cid <aacid at kde.org> wrote:
> El Diumenge, 1 d'abril de 2012, a les 11:57:59, Ihar `Philips` Filipau va
> escriure:
>> Add version to produced XML file.
>
> This needs an update to the dtd too, doesn't it?

No Clue. Not really an XML specialist. My XML reader (libxml2 based)
has optional DTD validation which I have never used. Otherwise, I have
no idea why DTD is even needed - to me it kind of defies purpose of
XML.

Considering that Googling revealed about 7 distinctly different
pdf2xml.dtd's, I think the best change in the area could have been
*removal* of the DTD. Or at least renaming it into something else, if
it is really needed. But that is too much of a change.

Now bit more seriously. Is it possible to extract PDF file properties
(producer, date, etc) in some easier way, than what is present in the
pdfinfo tool? It uses the PDFDoc::getDocInfo() to access the
dictionary and then parses the data ... well, pretty much manually.
Manually assembling unicode characters, surrogate pairs, UnicodeMap
and all. If poppler has a method to parse the data for me, then I
would love to include the info into the XML output too. If no, then
let it be.

P.S. The patch for the poppler version information in XML and DTD attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pdftohtml-xml-version-002.diff
Type: text/x-patch
Size: 943 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120407/ebf88da1/attachment.bin>


More information about the poppler mailing list