That looks very promising, thank you.<br><br><div class="gmail_quote">On 4 November 2011 23:23, Dominic Lachowicz <span dir="ltr"><<a href="mailto:domlachowicz@gmail.com">domlachowicz@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Todd,<br>
<br>
You're in the best position to comment on the suitability of the<br>
approaches. I really don't know what your goal is.<br>
<br>
Having worked a bit on the librsvg, Cairo, and poppler projects, I<br>
know that one can render a poppler page to a Cairo object via the<br>
poppler_page_render() function. And that Cairo supports writing to SVG<br>
surfaces, preserving all of the vector goodness (when possible) that<br>
you seem to expect.<br>
<br>
<a href="http://www.cairographics.org/manual/cairo-SVG-Surfaces.html" target="_blank">http://www.cairographics.org/manual/cairo-SVG-Surfaces.html</a><br>
<br>
You can test this out using the "pdftocairo" command line tool without<br>
needing to write a line of code.<br>
<br>
I believe that one can do something similar with the Qt backend, but<br>
that's outside of my area of expertise.<br>
<br>
I hope that helps,<br>
Dom<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <<a href="mailto:todd.hubers@alivate.com.au">todd.hubers@alivate.com.au</a>> wrote:<br>
> Hi Dom,<br>
> You can probably tell me :) I'm not claiming to be a poppler genius. Please<br>
> do elaborate on the suitability the CairoOutputDevice to generate an SVG<br>
> (remembering that SVGs are favoured for their vector ability for text, lines<br>
> and filled shapes).<br>
><br>
> Thanks, Todd.<br>
><br>
> On 4 November 2011 22:55, Dominic Lachowicz <<a href="mailto:domlachowicz@gmail.com">domlachowicz@gmail.com</a>> wrote:<br>
>><br>
>> Just out of curiosity, how would the proposed SVGOutputDevice differ<br>
>> from using (say) the existing CairoOutputDevice that was configured to<br>
>> write to SVG? That can already be accomplished today.<br>
>><br>
>> Thanks,<br>
>> Dom<br>
>><br>
>> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers <<a href="mailto:todd.hubers@alivate.com.au">todd.hubers@alivate.com.au</a>><br>
>> wrote:<br>
>> > Alec, I'm quite sold on the SVG idea. It is self contained and can even<br>
>> > work<br>
>> > outside the browser.<br>
>> > Josh, it would seem that the HTMLOutputDevice is the better candidate<br>
>> > for<br>
>> > SVG. HTML would be a good interim solution as well, however with SVG,<br>
>> > everything is packaged into a single file as a package. With HTML the<br>
>> > browser is making repeated calls back to the web server (for image<br>
>> > resources), but with SVG it's naturally all together. You can also<br>
>> > achieve<br>
>> > effects like gradients in SVG quite easily and is better supported by<br>
>> > older<br>
>> > browsers than alternative approaches to getting PDF into the browser.<br>
>> > I am interested in seeing the latest version of the HTML solution. I may<br>
>> > attempt some preliminary SVG rendering.<br>
>> ><br>
>> > Back on the topic of "Data" output device. I'm already using XML for RTF<br>
>> > output (I'm doing this in my language of choice - C# though so it's not<br>
>> > an<br>
>> > easy task to contribute this back to poppler). It's true that direct<br>
>> > implementation of device drivers are more efficient, however XML or the<br>
>> > like<br>
>> > do provide a convenient interface very accessible for many programming<br>
>> > languages. I would not expect such a "data" output device to be used by<br>
>> > PDF<br>
>> > viewing applications. However it would be good for all other purposes,<br>
>> > where<br>
>> > such implementations are usually performed in batch processes and the<br>
>> > extra<br>
>> > processing in the presence of multi-threading is readily accepted in<br>
>> > return<br>
>> > for flexibility - that is, a larger community can make use of poppler.<br>
>> > Cheers,<br>
>> > Todd<br>
>> > On 4 November 2011 17:24, Josh Richardson <<a href="mailto:jric@chegg.com">jric@chegg.com</a>> wrote:<br>
>> >><br>
>> >> Hi Todd,<br>
>> >> Some of us who are working on pdftohtml utility have had similar<br>
>> >> thoughts.<br>
>> >> It's on my wish list to completely remove the need for a poppler<br>
>> >> output<br>
>> >> device by utilizing the SVG toolset available in modern browsers. In<br>
>> >> any<br>
>> >> case, we are achieving high accuracy on Gecko and Webkit browsers with<br>
>> >> the<br>
>> >> current version (not merged into the Poppler main repo yet, but I can<br>
>> >> send<br>
>> >> you an invite for a git repo that Alec Taylor made, which has all those<br>
>> >> latest changes.) I think it might meet your needs as-is, or with some<br>
>> >> tweaks to make it work better on other browsers.<br>
>> >> We are currently extracting the text and fonts for the browser to<br>
>> >> render<br>
>> >> directly, but still must rely on Splash, Cairo, etc. to rasterize other<br>
>> >> graphic operations. With the way we've done it, we have an easy path<br>
>> >> to<br>
>> >> change over to SVG, one graphic operation at a time, if you'd be<br>
>> >> interested<br>
>> >> in doing that.<br>
>> >> The idea of a separate "data" device is interesting, but I don't think<br>
>> >> it's the right way to go. In effect, you are talking about changing<br>
>> >> the PDF<br>
>> >> data to XML, and from there to other formats. I can appreciate the<br>
>> >> sentiment, since PDF is such a difficult format to work with, but<br>
>> >> adding a<br>
>> >> layer of abstraction is just going to make things more complex,<br>
>> >> error-prone,<br>
>> >> and slow. To note, the current version of pdftohtml creates a valid<br>
>> >> XML-compliant HTML format — actually there's a small bug, but you<br>
>> >> probably<br>
>> >> get the point. You can always use the XML-compliant HTML as your<br>
>> >> easier-to-digest "data" format, which also allows us to represent more<br>
>> >> semantics than are available in the original PDF document, and you can<br>
>> >> always extend it with whatever XML tags you need. For example, I<br>
>> >> extended<br>
>> >> it with an attribute describing bounding boxes for all of the text<br>
>> >> spans.<br>
>> >> Let me know if you want the repo invite.<br>
>> >> Best, --josh<br>
>> >> From: Todd Hubers <<a href="mailto:todd.hubers@alivate.com.au">todd.hubers@alivate.com.au</a>><br>
>> >> Date: Thu, 3 Nov 2011 18:13:52 -0700<br>
>> >> To: "<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>" <<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>><br>
>> >> Subject: [poppler] Poppler - SVG Device<br>
>> >><br>
>> >> I'm currently using Poppler for Text extraction and using GhostScript<br>
>> >> for<br>
>> >> PDF to Image functionality, all for viewing PDFs online without<br>
>> >> requiring a<br>
>> >> PDF plugin in the browser.<br>
>> >><br>
>> >> I noticed Mozilla was working on an interesting project, PDF.js<br>
>> >> [<a href="https://wiki.mozilla.org/PDF.js" target="_blank">https://wiki.mozilla.org/PDF.js</a>]. It loads PDF files with pure<br>
>> >> Javascript<br>
>> >> (on a HTML5 compatible browser - probably needs canvas).<br>
>> >><br>
>> >> This is an opportunity for poppler to steam ahead and get some headline<br>
>> >> grabbing exposure. The SVG format is well supported by browsers. PDFs<br>
>> >> are<br>
>> >> portable across systems, however SVGs are very portable (and fast)<br>
>> >> across<br>
>> >> the web.<br>
>> >><br>
>> >> I propose the building of an SVG Device - PDF to SVG. I am currently<br>
>> >> considering using PDF to XML, to then perform XML to SVG. Given the<br>
>> >> status<br>
>> >> quo, I believe it's time for PDF to SVG.<br>
>> >><br>
>> >> I see SVG as a very efficient and therefore powerful web format, I hope<br>
>> >> others in the poppler community will see the potential as I do.<br>
>> >><br>
>> >> Thanks,<br>
>> >><br>
>> >> Todd Hubers (BBIT Hons)<br>
>> >> Alivate<br>
>> >><br>
>> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then tools for<br>
>> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to have<br>
>> >> simply one<br>
>> >> direct rendering device and one "data" device.<br>
>> ><br>
>> ><br>
>> > _______________________________________________<br>
>> > poppler mailing list<br>
>> > <a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a><br>
>> > <a href="http://lists.freedesktop.org/mailman/listinfo/poppler" target="_blank">http://lists.freedesktop.org/mailman/listinfo/poppler</a><br>
>> ><br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> "I like to pay taxes. With them, I buy civilization." -- Oliver Wendell<br>
>> Holmes<br>
><br>
><br>
<br>
<br>
<br>
--<br>
"I like to pay taxes. With them, I buy civilization." -- Oliver Wendell Holmes<br>
</div></div></blockquote></div><br>