[poppler] pdftohtml, separate CSS file

Joaquin Cuenca Abela joaquin at cuencaabela.com
Thu Jun 23 03:39:09 PDT 2011


When I said "but it's rare to have to further edit / maintain these
files" I was talking about the files generated by pdftohtml, not the
files I have on my web projects. One of these web projects went over
1M visits / day, got acquired by Google, worked there for 3 years, and
I think I'm quite competent wrt html / css.

Actually I mostly agree with your HTML / CSS points, but you have to
consider backwards compatibility, and when you make a breaking change
to make sure it's worth it. What I, as a user, think will be more
useful is to extract more info from PDF and put it in the HTML. Using
HTML5, XHTML, splitting CSS, etc. is something that doesn't affect me
as I have to post-process the pdftohtml output anyway, but I'm worried
too many cosmetic changes will break other people's pipelines. Don't
take this as "nothing should change, ever!". The current code is a bit
messy, in part due to the proliferation of flags, some of them with
weird effects on the code, and it seems to me such a piece of code is
getting difficult to maintain. I'm just saying that if you make a
breaking change you have to make it really worth it.

Cheers,

On Wed, Jun 22, 2011 at 11:18 PM, Marc J. Driftmeyer
<mjd at reanimality.com> wrote:
> If you don't have to routinely update your CSS and extend the sites you
> design's content with new formatting you clearly aren't working on sites
> that are heavy in demand and drive a lot of diverse content.
>
> More to the point, you're not working that deeply with CSS 2, let alone 3 if
> that's the case. One of the main reasons for separating the structure from
> the formatting is to extend the reuse of the structure to other viewports.
>
> SIncerely,
>
> Marc J. Driftmeyer
>
> On 06/22/2011 11:40 AM, Joaquin Cuenca Abela wrote:
>
> I disagree. It's useful to have everything on a single file (if there
> are no images), for instance you can pipe the result to stdout for
> scripts and things like that. What benefit do you get by splitting the
> html and the css? I split them all the time on my web projects, but
> it's rare to have to further edit / maintain these files, and you're
> going to break existing scripts / pipelines if you change the default
> behaviour. I think you need a very good reason to do that.
>
> 2011/6/22 Josh Richardson <jric at chegg.com>:
>
> On second thought, what if I provide the old behavior via a switch?  I
> think that the separate CSS file is the better default behavior.
>
> Thanks, --josh
>
> On 6/22/11 11:15 AM, "Josh Richardson" <jric at chegg.com> wrote:
>
> Ok, will do.  --josh
>
> On 6/22/11 12:59 AM, "Albert Astals Cid" <aacid at kde.org> wrote:
>
> A Wednesday, June 22, 2011, Josh Richardson va escriure:
>
> Experienced web developers always separate their CSS from their HTML
> file
> This makes maintenance and overriding of the styling much easier, as
> well
> as keeping the HTML file itself (nearly) completely content / semantics
> focused.
>
> In the complex mode, I would like to separate out the styling into a
> separate CSS file, referenced from the output HTML file.  Any
> objections
> to this?
>
> I'd prefer if you added this through a flag other than modifying the
> behaviour
> (since people always complain when you change the behavoiur of something)
>
> I am also cleaning up the tags so that they are all balanced and XHTML,
> hence XML-compliant.
>
> Makes sense, but please make this a separate patch.
>
> Albert
>
> Once this is done along with CSS separated out, I'm
> not sure of a need for a separate ­xml mode for pdftohtml.  Thoughts on
> this?
>
> Thanks, --josh
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>
>
>
> --
> Marc J. Driftmeyer
> Email :: mjd at reanimality.com
> Web :: http://www.reanimality.com
> Cell :: (509) 435-5212
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>



-- 
Joaquin Cuenca Abela -- presspeople.com: Fuentes de prensa y comunicados


More information about the poppler mailing list