[poppler] Poppler - SVG Device

Todd Hubers todd.hubers at alivate.com.au
Fri Nov 4 05:43:54 PDT 2011


That looks very promising, thank you.

On 4 November 2011 23:23, Dominic Lachowicz <domlachowicz at gmail.com> wrote:

> Hi Todd,
>
> You're in the best position to comment on the suitability of the
> approaches. I really don't know what your goal is.
>
> Having worked a bit on the librsvg, Cairo, and poppler projects, I
> know that one can render a poppler page to a Cairo object via the
> poppler_page_render() function. And that Cairo supports writing to SVG
> surfaces, preserving all of the vector goodness (when possible) that
> you seem to expect.
>
> http://www.cairographics.org/manual/cairo-SVG-Surfaces.html
>
> You can test this out using the "pdftocairo" command line tool without
> needing to write a line of code.
>
> I believe that one can do something similar with the Qt backend, but
> that's outside of my area of expertise.
>
> I hope that helps,
> Dom
>
> On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <todd.hubers at alivate.com.au>
> wrote:
> > Hi Dom,
> > You can probably tell me :) I'm not claiming to be a poppler genius.
> Please
> > do elaborate on the suitability the CairoOutputDevice to generate an SVG
> > (remembering that SVGs are favoured for their vector ability for text,
> lines
> > and filled shapes).
> >
> > Thanks, Todd.
> >
> > On 4 November 2011 22:55, Dominic Lachowicz <domlachowicz at gmail.com>
> wrote:
> >>
> >> Just out of curiosity, how would the proposed SVGOutputDevice differ
> >> from using (say) the existing CairoOutputDevice that was configured to
> >> write to SVG? That can already be accomplished today.
> >>
> >> Thanks,
> >> Dom
> >>
> >> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers <todd.hubers at alivate.com.au
> >
> >> wrote:
> >> > Alec, I'm quite sold on the SVG idea. It is self contained and can
> even
> >> > work
> >> > outside the browser.
> >> > Josh, it would seem that the HTMLOutputDevice is the better candidate
> >> > for
> >> > SVG. HTML would be a good interim solution as well, however with SVG,
> >> > everything is packaged into a single file as a package. With HTML the
> >> > browser is making repeated calls back to the web server (for image
> >> > resources), but with SVG it's naturally all together. You can also
> >> > achieve
> >> > effects like gradients in SVG quite easily and is better supported by
> >> > older
> >> > browsers than alternative approaches to getting PDF into the browser.
> >> > I am interested in seeing the latest version of the HTML solution. I
> may
> >> > attempt some preliminary SVG rendering.
> >> >
> >> > Back on the topic of "Data" output device. I'm already using XML for
> RTF
> >> > output (I'm doing this in my language of choice - C# though so it's
> not
> >> > an
> >> > easy task to contribute this back to poppler). It's true that direct
> >> > implementation of device drivers are more efficient, however XML or
> the
> >> > like
> >> > do provide a convenient interface very accessible for many programming
> >> > languages. I would not expect such a "data" output device to be used
> by
> >> > PDF
> >> > viewing applications. However it would be good for all other purposes,
> >> > where
> >> > such implementations are usually performed in batch processes and the
> >> > extra
> >> > processing in the presence of multi-threading is readily accepted in
> >> > return
> >> > for flexibility - that is, a larger community can make use of poppler.
> >> > Cheers,
> >> > Todd
> >> > On 4 November 2011 17:24, Josh Richardson <jric at chegg.com> wrote:
> >> >>
> >> >> Hi Todd,
> >> >> Some of us who are working on pdftohtml utility have had similar
> >> >> thoughts.
> >> >>  It's on my wish list to completely remove the need for a poppler
> >> >> output
> >> >> device by utilizing the SVG toolset available in modern browsers.  In
> >> >> any
> >> >> case, we are achieving high accuracy on Gecko and Webkit browsers
> with
> >> >> the
> >> >> current version (not merged into the Poppler main repo yet, but I can
> >> >> send
> >> >> you an invite for a git repo that Alec Taylor made, which has all
> those
> >> >> latest changes.)  I think it might meet your needs as-is, or with
> some
> >> >> tweaks to make it work better on other browsers.
> >> >> We are currently extracting the text and fonts for the browser to
> >> >> render
> >> >> directly, but still must rely on Splash, Cairo, etc. to rasterize
> other
> >> >> graphic operations.  With the way we've done it, we have an easy path
> >> >> to
> >> >> change over to SVG, one graphic operation at a time, if you'd be
> >> >> interested
> >> >> in doing that.
> >> >> The idea of a separate "data" device is interesting, but I don't
> think
> >> >> it's the right way to go.  In effect, you are talking about changing
> >> >> the PDF
> >> >> data to XML, and from there to other formats.  I can appreciate the
> >> >> sentiment, since PDF is such a difficult format to work with, but
> >> >> adding a
> >> >> layer of abstraction is just going to make things more complex,
> >> >> error-prone,
> >> >> and slow.  To note, the current version of pdftohtml creates a valid
> >> >> XML-compliant HTML format — actually there's a small bug, but you
> >> >> probably
> >> >> get the point.  You can always use the XML-compliant HTML as your
> >> >> easier-to-digest "data" format, which also allows us to represent
> more
> >> >> semantics than are available in the original PDF document, and you
> can
> >> >> always extend it with whatever XML tags you need.  For example, I
> >> >> extended
> >> >> it with an attribute describing bounding boxes for all of the text
> >> >> spans.
> >> >>  Let me know if you want the repo invite.
> >> >> Best, --josh
> >> >> From: Todd Hubers <todd.hubers at alivate.com.au>
> >> >> Date: Thu, 3 Nov 2011 18:13:52 -0700
> >> >> To: "poppler at lists.freedesktop.org" <poppler at lists.freedesktop.org>
> >> >> Subject: [poppler] Poppler - SVG Device
> >> >>
> >> >> I'm currently using Poppler for Text extraction and using GhostScript
> >> >> for
> >> >> PDF to Image functionality, all for viewing PDFs online without
> >> >> requiring a
> >> >> PDF plugin in the browser.
> >> >>
> >> >> I noticed Mozilla was working on an interesting project, PDF.js
> >> >> [https://wiki.mozilla.org/PDF.js]. It loads PDF files with pure
> >> >> Javascript
> >> >> (on a HTML5 compatible browser - probably needs canvas).
> >> >>
> >> >> This is an opportunity for poppler to steam ahead and get some
> headline
> >> >> grabbing exposure. The SVG format is well supported by browsers. PDFs
> >> >> are
> >> >> portable across systems, however SVGs are very portable (and fast)
> >> >> across
> >> >> the web.
> >> >>
> >> >> I propose the building of an SVG Device - PDF to SVG. I am currently
> >> >> considering using PDF to XML, to then perform XML to SVG. Given the
> >> >> status
> >> >> quo, I believe it's time for PDF to SVG.
> >> >>
> >> >> I see SVG as a very efficient and therefore powerful web format, I
> hope
> >> >> others in the poppler community will see the potential as I do.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Todd Hubers (BBIT Hons)
> >> >> Alivate
> >> >>
> >> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then tools for
> >> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to have
> >> >> simply one
> >> >> direct rendering device and one "data" device.
> >> >
> >> >
> >> > _______________________________________________
> >> > poppler mailing list
> >> > poppler at lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/poppler
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> "I like to pay taxes. With them, I buy civilization." --  Oliver Wendell
> >> Holmes
> >
> >
>
>
>
> --
> "I like to pay taxes. With them, I buy civilization." --  Oliver Wendell
> Holmes
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111104/b414ef95/attachment-0001.htm>


More information about the poppler mailing list