[poppler] hello and a question about HtmlOutputDev

Brad Hards bradh at frogmouth.net
Sun Jun 11 01:09:47 PDT 2006


On Sunday 11 June 2006 17:49 pm, Jauco Noordzij wrote:
> Well, such a conversion is actually what I'm working on for abiword.I
> have described the difficulties (and my solution) for creating
> structured documents like ODF instead of plain vector based XML (or
> Lot's-of-layers-html) at:
> http://code.google.com/soc/abisource/appinfo.html?csaid=2A72C68C42173A62
> I think the complete process is somewhat out of scope for a pdf
> reading library, but if I can build the XML outputdev the
> functionality will be there :)
My point is that if you build it on the poppler side, then it will work for 
anyone.

Also, you might like to look at the KOffice PDF import filter. It does a 
reasonable job, although recent changes to poppler might allow you to avoid 
making so many guesses (you said "set of heuristics", but we are talking 
about the same thing :-). There can be structure in a PDF document (see spec 
version 3.6), and getting that structure back is the key to a great import 
filter.

Brad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20060611/fdc61a6d/attachment.pgp


More information about the poppler mailing list