[poppler] hello and a question about HtmlOutputDev
Brad Hards
bradh at frogmouth.net
Sun Jun 11 01:09:47 PDT 2006
On Sunday 11 June 2006 17:49 pm, Jauco Noordzij wrote:
> Well, such a conversion is actually what I'm working on for abiword.I
> have described the difficulties (and my solution) for creating
> structured documents like ODF instead of plain vector based XML (or
> Lot's-of-layers-html) at:
> http://code.google.com/soc/abisource/appinfo.html?csaid=2A72C68C42173A62
> I think the complete process is somewhat out of scope for a pdf
> reading library, but if I can build the XML outputdev the
> functionality will be there :)
My point is that if you build it on the poppler side, then it will work for
anyone.
Also, you might like to look at the KOffice PDF import filter. It does a
reasonable job, although recent changes to poppler might allow you to avoid
making so many guesses (you said "set of heuristics", but we are talking
about the same thing :-). There can be structure in a PDF document (see spec
version 3.6), and getting that structure back is the key to a great import
filter.
Brad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20060611/fdc61a6d/attachment.pgp
More information about the poppler
mailing list