[poppler] Poppler - SVG Device

Josh Richardson jric at chegg.com
Fri Nov 4 09:54:41 PDT 2011


Cool!  I didn't know about SVG output from Cairo.  I have scripts to turn
the fonts into SVG.  Todd, I'll send you an invite to the pdftohtml repo
(and those scripts), and you can play with that.  We could also consider
combining the SVG device with the HtmlOutputDev, as we've done with
SplashOutputDevHtmlImages.

--josh

On 11/4/11 5:52 AM, "Albert Astals Cid" <aacid at kde.org> wrote:

>A Divendres, 4 de novembre de 2011, Dominic Lachowicz vàreu escriure:
>> Hi Todd,
>>
>> You're in the best position to comment on the suitability of the
>> approaches. I really don't know what your goal is.
>>
>> Having worked a bit on the librsvg, Cairo, and poppler projects, I
>> know that one can render a poppler page to a Cairo object via the
>> poppler_page_render() function. And that Cairo supports writing to SVG
>> surfaces, preserving all of the vector goodness (when possible) that
>> you seem to expect.
>
>Without knowing anything about what Cairo does behind the scenes I guess
>the
>harder part is vectorizing the fonts.
>
>> http://www.cairographics.org/manual/cairo-SVG-Surfaces.html
>>
>> You can test this out using the "pdftocairo" command line tool without
>> needing to write a line of code.
>>
>> I believe that one can do something similar with the Qt backend, but
>> that's outside of my area of expertise.
>
>Yeah QPainter can do that too, but given the Arthur backend can not be
>compared against the Splash or Cairo ones i guess the results would not be
>that great.
>
>Albert
>
>>
>> I hope that helps,
>> Dom
>>
>> On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <todd.hubers at alivate.com.au>
>wrote:
>> > Hi Dom,
>> > You can probably tell me :) I'm not claiming to be a poppler genius.
>> > Please do elaborate on the suitability the CairoOutputDevice to
>> > generate an SVG (remembering that SVGs are favoured for their vector
>> > ability for text, lines and filled shapes).
>> >
>> > Thanks, Todd.
>> >
>> > On 4 November 2011 22:55, Dominic Lachowicz <domlachowicz at gmail.com>
>wrote:
>> >> Just out of curiosity, how would the proposed SVGOutputDevice differ
>> >> from using (say) the existing CairoOutputDevice that was configured
>>to
>> >> write to SVG? That can already be accomplished today.
>> >>
>> >> Thanks,
>> >> Dom
>> >>
>> >> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers
>> >> <todd.hubers at alivate.com.au>
>> >>
>> >> wrote:
>> >> > Alec, I'm quite sold on the SVG idea. It is self contained and can
>> >> > even work
>> >> > outside the browser.
>> >> > Josh, it would seem that the HTMLOutputDevice is the better
>> >> > candidate
>> >> > for
>> >> > SVG. HTML would be a good interim solution as well, however with
>> >> > SVG,
>> >> > everything is packaged into a single file as a package. With HTML
>> >> > the
>> >> > browser is making repeated calls back to the web server (for image
>> >> > resources), but with SVG it's naturally all together. You can also
>> >> > achieve
>> >> > effects like gradients in SVG quite easily and is better supported
>> >> > by
>> >> > older
>> >> > browsers than alternative approaches to getting PDF into the
>> >> > browser.
>> >> > I am interested in seeing the latest version of the HTML solution.
>> >> > I may attempt some preliminary SVG rendering.
>> >> >
>> >> > Back on the topic of "Data" output device. I'm already using XML
>> >> > for RTF output (I'm doing this in my language of choice - C#
>> >> > though so it's not an
>> >> > easy task to contribute this back to poppler). It's true that
>> >> > direct
>> >> > implementation of device drivers are more efficient, however XML
>> >> > or the like
>> >> > do provide a convenient interface very accessible for many
>> >> > programming
>> >> > languages. I would not expect such a "data" output device to be
>> >> > used by PDF
>> >> > viewing applications. However it would be good for all other
>> >> > purposes,
>> >> > where
>> >> > such implementations are usually performed in batch processes and
>> >> > the
>> >> > extra
>> >> > processing in the presence of multi-threading is readily accepted
>> >> > in
>> >> > return
>> >> > for flexibility - that is, a larger community can make use of
>> >> > poppler.
>> >> > Cheers,
>> >> > Todd
>> >> >
>> >> > On 4 November 2011 17:24, Josh Richardson <jric at chegg.com> wrote:
>> >> >> Hi Todd,
>> >> >> Some of us who are working on pdftohtml utility have had similar
>> >> >> thoughts.
>> >> >>  It's on my wish list to completely remove the need for a
>> >> >> poppler
>> >> >> output
>> >> >> device by utilizing the SVG toolset available in modern
>> >> >> browsers.  In
>> >> >> any
>> >> >> case, we are achieving high accuracy on Gecko and Webkit
>> >> >> browsers with the
>> >> >> current version (not merged into the Poppler main repo yet, but
>> >> >> I can
>> >> >> send
>> >> >> you an invite for a git repo that Alec Taylor made, which has
>> >> >> all those latest changes.)  I think it might meet your needs
>> >> >> as-is, or with some tweaks to make it work better on other
>> >> >> browsers.
>> >> >> We are currently extracting the text and fonts for the browser
>> >> >> to
>> >> >> render
>> >> >> directly, but still must rely on Splash, Cairo, etc. to
>> >> >> rasterize other graphic operations.  With the way we've done
>> >> >> it, we have an easy path to
>> >> >> change over to SVG, one graphic operation at a time, if you'd be
>> >> >> interested
>> >> >> in doing that.
>> >> >> The idea of a separate "data" device is interesting, but I don't
>> >> >> think it's the right way to go.  In effect, you are talking
>> >> >> about changing the PDF
>> >> >> data to XML, and from there to other formats.  I can appreciate
>> >> >> the
>> >> >> sentiment, since PDF is such a difficult format to work with,
>> >> >> but
>> >> >> adding a
>> >> >> layer of abstraction is just going to make things more complex,
>> >> >> error-prone,
>> >> >> and slow.  To note, the current version of pdftohtml creates a
>> >> >> valid
>> >> >> XML-compliant HTML format ‹ actually there's a small bug, but
>> >> >> you
>> >> >> probably
>> >> >> get the point.  You can always use the XML-compliant HTML as
>> >> >> your
>> >> >> easier-to-digest "data" format, which also allows us to
>> >> >> represent more semantics than are available in the original PDF
>> >> >> document, and you can always extend it with whatever XML tags
>> >> >> you need.  For example, I extended
>> >> >> it with an attribute describing bounding boxes for all of the
>> >> >> text
>> >> >> spans.
>> >> >>  Let me know if you want the repo invite.
>> >> >> Best, --josh
>> >> >> From: Todd Hubers <todd.hubers at alivate.com.au>
>> >> >> Date: Thu, 3 Nov 2011 18:13:52 -0700
>> >> >> To: "poppler at lists.freedesktop.org"
>> >> >> <poppler at lists.freedesktop.org>
>> >> >> Subject: [poppler] Poppler - SVG Device
>> >> >>
>> >> >> I'm currently using Poppler for Text extraction and using
>> >> >> GhostScript
>> >> >> for
>> >> >> PDF to Image functionality, all for viewing PDFs online without
>> >> >> requiring a
>> >> >> PDF plugin in the browser.
>> >> >>
>> >> >> I noticed Mozilla was working on an interesting project, PDF.js
>> >> >> [https://wiki.mozilla.org/PDF.js]. It loads PDF files with pure
>> >> >> Javascript
>> >> >> (on a HTML5 compatible browser - probably needs canvas).
>> >> >>
>> >> >> This is an opportunity for poppler to steam ahead and get some
>> >> >> headline grabbing exposure. The SVG format is well supported by
>> >> >> browsers. PDFs are
>> >> >> portable across systems, however SVGs are very portable (and
>> >> >> fast)
>> >> >> across
>> >> >> the web.
>> >> >>
>> >> >> I propose the building of an SVG Device - PDF to SVG. I am
>> >> >> currently
>> >> >> considering using PDF to XML, to then perform XML to SVG. Given
>> >> >> the
>> >> >> status
>> >> >> quo, I believe it's time for PDF to SVG.
>> >> >>
>> >> >> I see SVG as a very efficient and therefore powerful web format,
>> >> >> I hope others in the poppler community will see the potential
>> >> >> as I do.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Todd Hubers (BBIT Hons)
>> >> >> Alivate
>> >> >>
>> >> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then
>> >> >> tools for
>> >> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to
>> >> >> have
>> >> >> simply one
>> >> >> direct rendering device and one "data" device.
>> >> >
>> >> > _______________________________________________
>> >> > poppler mailing list
>> >> > poppler at lists.freedesktop.org
>> >> > http://lists.freedesktop.org/mailman/listinfo/poppler
>> >>
>> >> --
>> >> "I like to pay taxes. With them, I buy civilization." --  Oliver
>> >> Wendell
>> >> Holmes
>>
>> --
>> "I like to pay taxes. With them, I buy civilization." --  Oliver Wendell
>> Holmes _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler




More information about the poppler mailing list