[poppler] pdftohtml, separate CSS file

Marc J. Driftmeyer
Tue Jun 21 21:14:23 PDT 2011

As one with 10+ years with CSS, XHTML1.x and now HTML 5 I have to ask 
which versions of the XHTML specification you plan on supporting.

I would assume you would target XHTML1.1 Strict and leave the notion of 
the XHTML 1.1 Modular alone as we've all departed on to HTML 5.

Which brings me to the question, pdftohtml should include output to HTML 
5, and since it's on all platforms perhaps one should utilize the WebKit 
HTML 5 Parser, especially since GTK+ and Qt are all in. GTK+ is even 
modularizing out their work so to separate the JavaScript engine to be 
reusable within other GTK+ projects.

 From GTK+ Changelog:

2011-06-20  Carlos Garcia Campos <cgarcia at igalia.com>

         Reviewed by Xan Lopez.

         [GTK] Split libWebCore into two libWebCore and libWebCoreGtk

         * GNUmakefile.am: Link to libWebCoreGtk.la too.

WebKitGTK+ 1.5.1

What's new in WebKitGTK+ 1.5.1?

   - The JSC library is now available independently. It's called
     "libjavascriptcoregtk", and it comes with its own pkg-config file.
   - New spellchecking APIs, useful to implement spellchecking features
     in the UAs.
   - New DOM methods to check if editable areas have been modified by
     the user (webkit_dom_html_{input,text_area}_is_edited).
   - Lots of improvements in the WebKit2GTK+ port.
   - Lots of bugfixes.

Since XHTML is a good citizen with HTML 5 I'd assume information on the 
WebKit HTML 5 Parser would be useful for the long haul.


If I'm off base, just ignore.

Sincerely Yours,

Marc J. Driftmeyer

On 06/21/2011 07:47 PM, Josh Richardson wrote:
> Experienced web developers always separate their CSS from their HTML 
> file  This makes maintenance and overriding of the styling much 
> easier, as well as keeping the HTML file itself (nearly) completely 
> content / semantics focused.
> In the complex mode, I would like to separate out the styling into a 
> separate CSS file, referenced from the output HTML file.  Any 
> objections to this?
> I am also cleaning up the tags so that they are all balanced and 
> XHTML, hence XML-compliant.  Once this is done along with CSS 
> separated out, I'm not sure of a need for a separate –xml mode for 
> pdftohtml.  Thoughts on this?
> Thanks, --josh
Marc J. Driftmeyer
Email :: mjd at reanimality.com <mailto:mjd at reanimality.com>
Web :: http://www.reanimality.com
Cell :: (509) 435-5212
