[poppler] Poppler - SVG Device

Dominic Lachowicz domlachowicz at gmail.com
Fri Nov 4 05:23:34 PDT 2011


Hi Todd,

You're in the best position to comment on the suitability of the
approaches. I really don't know what your goal is.

Having worked a bit on the librsvg, Cairo, and poppler projects, I
know that one can render a poppler page to a Cairo object via the
poppler_page_render() function. And that Cairo supports writing to SVG
surfaces, preserving all of the vector goodness (when possible) that
you seem to expect.

http://www.cairographics.org/manual/cairo-SVG-Surfaces.html

You can test this out using the "pdftocairo" command line tool without
needing to write a line of code.

I believe that one can do something similar with the Qt backend, but
that's outside of my area of expertise.

I hope that helps,
Dom

On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <todd.hubers at alivate.com.au> wrote:
> Hi Dom,
> You can probably tell me :) I'm not claiming to be a poppler genius. Please
> do elaborate on the suitability the CairoOutputDevice to generate an SVG
> (remembering that SVGs are favoured for their vector ability for text, lines
> and filled shapes).
>
> Thanks, Todd.
>
> On 4 November 2011 22:55, Dominic Lachowicz <domlachowicz at gmail.com> wrote:
>>
>> Just out of curiosity, how would the proposed SVGOutputDevice differ
>> from using (say) the existing CairoOutputDevice that was configured to
>> write to SVG? That can already be accomplished today.
>>
>> Thanks,
>> Dom
>>
>> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers <todd.hubers at alivate.com.au>
>> wrote:
>> > Alec, I'm quite sold on the SVG idea. It is self contained and can even
>> > work
>> > outside the browser.
>> > Josh, it would seem that the HTMLOutputDevice is the better candidate
>> > for
>> > SVG. HTML would be a good interim solution as well, however with SVG,
>> > everything is packaged into a single file as a package. With HTML the
>> > browser is making repeated calls back to the web server (for image
>> > resources), but with SVG it's naturally all together. You can also
>> > achieve
>> > effects like gradients in SVG quite easily and is better supported by
>> > older
>> > browsers than alternative approaches to getting PDF into the browser.
>> > I am interested in seeing the latest version of the HTML solution. I may
>> > attempt some preliminary SVG rendering.
>> >
>> > Back on the topic of "Data" output device. I'm already using XML for RTF
>> > output (I'm doing this in my language of choice - C# though so it's not
>> > an
>> > easy task to contribute this back to poppler). It's true that direct
>> > implementation of device drivers are more efficient, however XML or the
>> > like
>> > do provide a convenient interface very accessible for many programming
>> > languages. I would not expect such a "data" output device to be used by
>> > PDF
>> > viewing applications. However it would be good for all other purposes,
>> > where
>> > such implementations are usually performed in batch processes and the
>> > extra
>> > processing in the presence of multi-threading is readily accepted in
>> > return
>> > for flexibility - that is, a larger community can make use of poppler.
>> > Cheers,
>> > Todd
>> > On 4 November 2011 17:24, Josh Richardson <jric at chegg.com> wrote:
>> >>
>> >> Hi Todd,
>> >> Some of us who are working on pdftohtml utility have had similar
>> >> thoughts.
>> >>  It's on my wish list to completely remove the need for a poppler
>> >> output
>> >> device by utilizing the SVG toolset available in modern browsers.  In
>> >> any
>> >> case, we are achieving high accuracy on Gecko and Webkit browsers with
>> >> the
>> >> current version (not merged into the Poppler main repo yet, but I can
>> >> send
>> >> you an invite for a git repo that Alec Taylor made, which has all those
>> >> latest changes.)  I think it might meet your needs as-is, or with some
>> >> tweaks to make it work better on other browsers.
>> >> We are currently extracting the text and fonts for the browser to
>> >> render
>> >> directly, but still must rely on Splash, Cairo, etc. to rasterize other
>> >> graphic operations.  With the way we've done it, we have an easy path
>> >> to
>> >> change over to SVG, one graphic operation at a time, if you'd be
>> >> interested
>> >> in doing that.
>> >> The idea of a separate "data" device is interesting, but I don't think
>> >> it's the right way to go.  In effect, you are talking about changing
>> >> the PDF
>> >> data to XML, and from there to other formats.  I can appreciate the
>> >> sentiment, since PDF is such a difficult format to work with, but
>> >> adding a
>> >> layer of abstraction is just going to make things more complex,
>> >> error-prone,
>> >> and slow.  To note, the current version of pdftohtml creates a valid
>> >> XML-compliant HTML format — actually there's a small bug, but you
>> >> probably
>> >> get the point.  You can always use the XML-compliant HTML as your
>> >> easier-to-digest "data" format, which also allows us to represent more
>> >> semantics than are available in the original PDF document, and you can
>> >> always extend it with whatever XML tags you need.  For example, I
>> >> extended
>> >> it with an attribute describing bounding boxes for all of the text
>> >> spans.
>> >>  Let me know if you want the repo invite.
>> >> Best, --josh
>> >> From: Todd Hubers <todd.hubers at alivate.com.au>
>> >> Date: Thu, 3 Nov 2011 18:13:52 -0700
>> >> To: "poppler at lists.freedesktop.org" <poppler at lists.freedesktop.org>
>> >> Subject: [poppler] Poppler - SVG Device
>> >>
>> >> I'm currently using Poppler for Text extraction and using GhostScript
>> >> for
>> >> PDF to Image functionality, all for viewing PDFs online without
>> >> requiring a
>> >> PDF plugin in the browser.
>> >>
>> >> I noticed Mozilla was working on an interesting project, PDF.js
>> >> [https://wiki.mozilla.org/PDF.js]. It loads PDF files with pure
>> >> Javascript
>> >> (on a HTML5 compatible browser - probably needs canvas).
>> >>
>> >> This is an opportunity for poppler to steam ahead and get some headline
>> >> grabbing exposure. The SVG format is well supported by browsers. PDFs
>> >> are
>> >> portable across systems, however SVGs are very portable (and fast)
>> >> across
>> >> the web.
>> >>
>> >> I propose the building of an SVG Device - PDF to SVG. I am currently
>> >> considering using PDF to XML, to then perform XML to SVG. Given the
>> >> status
>> >> quo, I believe it's time for PDF to SVG.
>> >>
>> >> I see SVG as a very efficient and therefore powerful web format, I hope
>> >> others in the poppler community will see the potential as I do.
>> >>
>> >> Thanks,
>> >>
>> >> Todd Hubers (BBIT Hons)
>> >> Alivate
>> >>
>> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then tools for
>> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to have
>> >> simply one
>> >> direct rendering device and one "data" device.
>> >
>> >
>> > _______________________________________________
>> > poppler mailing list
>> > poppler at lists.freedesktop.org
>> > http://lists.freedesktop.org/mailman/listinfo/poppler
>> >
>> >
>>
>>
>>
>> --
>> "I like to pay taxes. With them, I buy civilization." --  Oliver Wendell
>> Holmes
>
>



-- 
"I like to pay taxes. With them, I buy civilization." --  Oliver Wendell Holmes


More information about the poppler mailing list