[poppler] pdftohtml, separate CSS file

Marc J. Driftmeyer mjd at reanimality.com
Thu Jun 23 22:17:15 PDT 2011


Has anyone considered the uses for pdftohtml that typesetters could 
leverage. Being one who writes an awful lot of LaTeX/XeTeX and 
outputting to pdf it would be a nice piece of additional functionality 
to apps like TeXWorks, TeXShop, LyX, Kile and others to have this 
library do the heavy lifting, as a stage after TeXLive has passed it off 
to pdf and one would like to have technical publications as closely 
resembled in HTML 5 compliance or XHTML1.1 compliance with on-line 
journals. Having Addendum to work/updates to such publications in HTML 
format with a robust CSS structure cleanly separated from the HTML would 
also allow one to later write their own scripts to modify the end 
product without having to hack the actual html.

- Marc

On 06/23/2011 03:39 AM, Joaquin Cuenca Abela wrote:
> When I said "but it's rare to have to further edit / maintain these
> files" I was talking about the files generated by pdftohtml, not the
> files I have on my web projects. One of these web projects went over
> 1M visits / day, got acquired by Google, worked there for 3 years, and
> I think I'm quite competent wrt html / css.
>
> Actually I mostly agree with your HTML / CSS points, but you have to
> consider backwards compatibility, and when you make a breaking change
> to make sure it's worth it. What I, as a user, think will be more
> useful is to extract more info from PDF and put it in the HTML. Using
> HTML5, XHTML, splitting CSS, etc. is something that doesn't affect me
> as I have to post-process the pdftohtml output anyway, but I'm worried
> too many cosmetic changes will break other people's pipelines. Don't
> take this as "nothing should change, ever!". The current code is a bit
> messy, in part due to the proliferation of flags, some of them with
> weird effects on the code, and it seems to me such a piece of code is
> getting difficult to maintain. I'm just saying that if you make a
> breaking change you have to make it really worth it.
>
> Cheers,
>
> On Wed, Jun 22, 2011 at 11:18 PM, Marc J. Driftmeyer
> <mjd at reanimality.com>  wrote:
>> If you don't have to routinely update your CSS and extend the sites you
>> design's content with new formatting you clearly aren't working on sites
>> that are heavy in demand and drive a lot of diverse content.
>>
>> More to the point, you're not working that deeply with CSS 2, let alone 3 if
>> that's the case. One of the main reasons for separating the structure from
>> the formatting is to extend the reuse of the structure to other viewports.
>>
>> SIncerely,
>>
>> Marc J. Driftmeyer
>>
>> On 06/22/2011 11:40 AM, Joaquin Cuenca Abela wrote:
>>
>> I disagree. It's useful to have everything on a single file (if there
>> are no images), for instance you can pipe the result to stdout for
>> scripts and things like that. What benefit do you get by splitting the
>> html and the css? I split them all the time on my web projects, but
>> it's rare to have to further edit / maintain these files, and you're
>> going to break existing scripts / pipelines if you change the default
>> behaviour. I think you need a very good reason to do that.
>>
>> 2011/6/22 Josh Richardson<jric at chegg.com>:
>>
>> On second thought, what if I provide the old behavior via a switch?  I
>> think that the separate CSS file is the better default behavior.
>>
>> Thanks, --josh
>>
>> On 6/22/11 11:15 AM, "Josh Richardson"<jric at chegg.com>  wrote:
>>
>> Ok, will do.  --josh
>>
>> On 6/22/11 12:59 AM, "Albert Astals Cid"<aacid at kde.org>  wrote:
>>
>> A Wednesday, June 22, 2011, Josh Richardson va escriure:
>>
>> Experienced web developers always separate their CSS from their HTML
>> file
>> This makes maintenance and overriding of the styling much easier, as
>> well
>> as keeping the HTML file itself (nearly) completely content / semantics
>> focused.
>>
>> In the complex mode, I would like to separate out the styling into a
>> separate CSS file, referenced from the output HTML file.  Any
>> objections
>> to this?
>>
>> I'd prefer if you added this through a flag other than modifying the
>> behaviour
>> (since people always complain when you change the behavoiur of something)
>>
>> I am also cleaning up the tags so that they are all balanced and XHTML,
>> hence XML-compliant.
>>
>> Makes sense, but please make this a separate patch.
>>
>> Albert
>>
>> Once this is done along with CSS separated out, I'm
>> not sure of a need for a separate ­xml mode for pdftohtml.  Thoughts on
>> this?
>>
>> Thanks, --josh
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>>
>>
>> --
>> Marc J. Driftmeyer
>> Email :: mjd at reanimality.com
>> Web :: http://www.reanimality.com
>> Cell :: (509) 435-5212
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>>
>
>

-- 
Marc J. Driftmeyer
Email :: mjd at reanimality.com <mailto:mjd at reanimality.com>
Web :: http://www.reanimality.com
Cell :: (509) 435-5212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110623/0fe93704/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mjd.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110623/0fe93704/attachment.vcf>


More information about the poppler mailing list