[poppler] hello and a question about HtmlOutputDev

Jauco Noordzij jauco at jauco.nl
Fri Jun 9 13:36:26 PDT 2006


Hello everybody!
I just subscribed to this list so for starters I'd like to give a little
introduction:
I'm a 23 year old student from holland. I'm working on a pdf input plugin
for abiword and have been looking at your library because it seemed to fit
the bill perfectly. I have been working with it for a few days and I like it
:)

Because I need to convert a pdf to another rich format I need something that
returns richer information than the TextOutputDev. Your current
HtmlOutputDev seems to work, but the headers are not public and it seems to
have been removed completely in cvs HEAD. In my local codebase I made a
public version and compiled the plugin against it. But before I continue I'd
like to know your stand on the different outputdevs. Did you remove the
HtmlOutputDev because it was unmaintained and buggy? or because you hold the
opinion that pdf shouldn't be converted? or some other reason?

If you would allow me to work on it, this is what I propose:
* I build an XMLOutputDev based on the html one.
* This is a public ouputdev (like the text one)
* It creates an xml representation of the pdf (sort off like the current
pdftohtml -xml)
* If you want I'll add a stylesheet to convert the xml to html.

I think that a conversion to xml can be valuable, because the format is much
clearer to work with, can be transformed using a variety of languages (like
XSLT) and, most important, because it would allow external people to process
pdf information without having to write a custom outputdev (which, I heard,
you don't like). But you would have thought of that yourselves as well, so I
am probably missing something :)

-- 
greetings,
     Jauco Noordzij
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/poppler/attachments/20060609/5811f7e3/attachment.html


More information about the poppler mailing list