[poppler] pdftohtml: add version of poppler to the XML output

Albert Astals Cid aacid at kde.org
Mon Apr 9 10:19:43 PDT 2012

El Dissabte, 7 d'abril de 2012, a les 04:51:58, Ihar `Philips` Filipau va 
> On 4/6/12, Albert Astals Cid <aacid at kde.org> wrote:
> > El Diumenge, 1 d'abril de 2012, a les 11:57:59, Ihar `Philips` Filipau va
> > 
> > escriure:
> >> Add version to produced XML file.
> > 
> > This needs an update to the dtd too, doesn't it?
> No Clue. Not really an XML specialist. My XML reader (libxml2 based)
> has optional DTD validation which I have never used. Otherwise, I have
> no idea why DTD is even needed - to me it kind of defies purpose of
> XML.

> Considering that Googling revealed about 7 distinctly different
> pdf2xml.dtd's, I think the best change in the area could have been
> *removal* of the DTD. Or at least renaming it into something else, if
> it is really needed. But that is too much of a change.

There is a single pdf2xml.dtd for pdftohtml, ours.

> Now bit more seriously. Is it possible to extract PDF file properties
> (producer, date, etc) in some easier way, than what is present in the
> pdfinfo tool? It uses the PDFDoc::getDocInfo() to access the
> dictionary and then parses the data ... well, pretty much manually.
> Manually assembling unicode characters, surrogate pairs, UnicodeMap
> and all. If poppler has a method to parse the data for me, then I
> would love to include the info into the XML output too. If no, then
> let it be.
> P.S. The patch for the poppler version information in XML and DTD attached.



More information about the poppler mailing list