[Poppler-bugs] [Bug 55540] Expose poppler-cairo

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Mar 31 08:02:48 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=55540

--- Comment #10 from Lu Wang <coolwanglu at gmail.com> ---
(In reply to comment #9)
> (In reply to comment #8)
>  
> > The reason I need those internal headers is that, I'm writing `pdf2htmlEX`,
> > something like pdftohtml in poppler but more powerful.
> 
> Isn't it possible to improve pdftohtml instead of writing a different tool?

In fact pdf2htmlEX was based on pdftohtml in a very early stage. I have thought
about improving pdftohtml directly, such that I can always enjoy all the
internal stuffs. but there might be a few problems:

1. Seems that pdftohtml aims to provide a (source) human-readable HTML
document, which is compatible with slightly older browsers; but the target of
pdf2htmlEX is to provide a pixel-wise accurate HTML document. which is also
optimized for publisher (e.g. split pages and assets). Therefore
 - It relies on lots of HTML5/CSS3 features, such that only latest browsers are
supported.
 - There are lots of ugly HTML element for adjusting the layout, so the source
can never be human readable

2. The crucial part is font manipulation. The most difficult and important work
in pdf2htmEX is to convert the font into web-friendly formats, together with
proper re-encoding. Without which pixel-wise accuracy can never be achieved.
For example, annotation links (I mean the borders) produced by pdftohtml are
not likely to work since the text are usually in the wrong positions. Also
printing is supported. Due to this, FontForge is heavily used, I don't think
it's appropriate for poppler to rely on it (or is it?)
 - FontForge has never been, although improved recently, binary-linking
friendly. There has been no documentations about header files. So basically all
what I've been doing are hacking. So sure if this may meet the quality
requirements of poppler.
 - Font conversion might be illegal (regional)? I remember reading old email
archives about font handling in poppler, which had been rejected.

3. There have been lots of tricks and hacks for HTML, which have made the
codebase complicated enough to be separated (IMHO). It may not be contained in
1-2 files in the util/ folder.

In case you would like to take a glance of pdf2htmlEX:
Here is a demo: http://coolwanglu.github.com/pdf2htmlEX/demo/demo.html
And here is the project page: https://github.com/coolwanglu/pdf2htmlEX

I would very much like to contribute some parts back into pdftohtml, but
unfortunately, they cannot work in pdftohtml, since almost everything depend on
proper font conversion.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130331/56df92fc/attachment.html>


More information about the Poppler-bugs mailing list