Hi Dom,<div><br></div><div>You can probably tell me :) I'm not claiming to be a poppler genius. Please do elaborate on the suitability the CairoOutputDevice to generate an SVG (remembering that SVGs are favoured for their vector ability for text, lines and filled shapes).<br>
<br>Thanks, Todd.<br><br><div class="gmail_quote">On 4 November 2011 22:55, Dominic Lachowicz <span dir="ltr"><<a href="mailto:domlachowicz@gmail.com">domlachowicz@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Just out of curiosity, how would the proposed SVGOutputDevice differ<br>
from using (say) the existing CairoOutputDevice that was configured to<br>
write to SVG? That can already be accomplished today.<br>
<br>
Thanks,<br>
Dom<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers <<a href="mailto:todd.hubers@alivate.com.au">todd.hubers@alivate.com.au</a>> wrote:<br>
> Alec, I'm quite sold on the SVG idea. It is self contained and can even work<br>
> outside the browser.<br>
> Josh, it would seem that the HTMLOutputDevice is the better candidate for<br>
> SVG. HTML would be a good interim solution as well, however with SVG,<br>
> everything is packaged into a single file as a package. With HTML the<br>
> browser is making repeated calls back to the web server (for image<br>
> resources), but with SVG it's naturally all together. You can also achieve<br>
> effects like gradients in SVG quite easily and is better supported by older<br>
> browsers than alternative approaches to getting PDF into the browser.<br>
> I am interested in seeing the latest version of the HTML solution. I may<br>
> attempt some preliminary SVG rendering.<br>
><br>
> Back on the topic of "Data" output device. I'm already using XML for RTF<br>
> output (I'm doing this in my language of choice - C# though so it's not an<br>
> easy task to contribute this back to poppler). It's true that direct<br>
> implementation of device drivers are more efficient, however XML or the like<br>
> do provide a convenient interface very accessible for many programming<br>
> languages. I would not expect such a "data" output device to be used by PDF<br>
> viewing applications. However it would be good for all other purposes, where<br>
> such implementations are usually performed in batch processes and the extra<br>
> processing in the presence of multi-threading is readily accepted in return<br>
> for flexibility - that is, a larger community can make use of poppler.<br>
> Cheers,<br>
> Todd<br>
> On 4 November 2011 17:24, Josh Richardson <<a href="mailto:jric@chegg.com">jric@chegg.com</a>> wrote:<br>
>><br>
>> Hi Todd,<br>
>> Some of us who are working on pdftohtml utility have had similar thoughts.<br>
>> It's on my wish list to completely remove the need for a poppler output<br>
>> device by utilizing the SVG toolset available in modern browsers. In any<br>
>> case, we are achieving high accuracy on Gecko and Webkit browsers with the<br>
>> current version (not merged into the Poppler main repo yet, but I can send<br>
>> you an invite for a git repo that Alec Taylor made, which has all those<br>
>> latest changes.) I think it might meet your needs as-is, or with some<br>
>> tweaks to make it work better on other browsers.<br>
>> We are currently extracting the text and fonts for the browser to render<br>
>> directly, but still must rely on Splash, Cairo, etc. to rasterize other<br>
>> graphic operations. With the way we've done it, we have an easy path to<br>
>> change over to SVG, one graphic operation at a time, if you'd be interested<br>
>> in doing that.<br>
>> The idea of a separate "data" device is interesting, but I don't think<br>
>> it's the right way to go. In effect, you are talking about changing the PDF<br>
>> data to XML, and from there to other formats. I can appreciate the<br>
>> sentiment, since PDF is such a difficult format to work with, but adding a<br>
>> layer of abstraction is just going to make things more complex, error-prone,<br>
>> and slow. To note, the current version of pdftohtml creates a valid<br>
>> XML-compliant HTML format — actually there's a small bug, but you probably<br>
>> get the point. You can always use the XML-compliant HTML as your<br>
>> easier-to-digest "data" format, which also allows us to represent more<br>
>> semantics than are available in the original PDF document, and you can<br>
>> always extend it with whatever XML tags you need. For example, I extended<br>
>> it with an attribute describing bounding boxes for all of the text spans.<br>
>> Let me know if you want the repo invite.<br>
>> Best, --josh<br>
>> From: Todd Hubers <<a href="mailto:todd.hubers@alivate.com.au">todd.hubers@alivate.com.au</a>><br>
>> Date: Thu, 3 Nov 2011 18:13:52 -0700<br>
>> To: "<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>" <<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>><br>
>> Subject: [poppler] Poppler - SVG Device<br>
>><br>
>> I'm currently using Poppler for Text extraction and using GhostScript for<br>
>> PDF to Image functionality, all for viewing PDFs online without requiring a<br>
>> PDF plugin in the browser.<br>
>><br>
>> I noticed Mozilla was working on an interesting project, PDF.js<br>
>> [<a href="https://wiki.mozilla.org/PDF.js" target="_blank">https://wiki.mozilla.org/PDF.js</a>]. It loads PDF files with pure Javascript<br>
>> (on a HTML5 compatible browser - probably needs canvas).<br>
>><br>
>> This is an opportunity for poppler to steam ahead and get some headline<br>
>> grabbing exposure. The SVG format is well supported by browsers. PDFs are<br>
>> portable across systems, however SVGs are very portable (and fast) across<br>
>> the web.<br>
>><br>
>> I propose the building of an SVG Device - PDF to SVG. I am currently<br>
>> considering using PDF to XML, to then perform XML to SVG. Given the status<br>
>> quo, I believe it's time for PDF to SVG.<br>
>><br>
>> I see SVG as a very efficient and therefore powerful web format, I hope<br>
>> others in the poppler community will see the potential as I do.<br>
>><br>
>> Thanks,<br>
>><br>
>> Todd Hubers (BBIT Hons)<br>
>> Alivate<br>
>><br>
>> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then tools for<br>
>> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to have simply one<br>
>> direct rendering device and one "data" device.<br>
><br>
><br>
</div></div><div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> poppler mailing list<br>
> <a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a><br>
> <a href="http://lists.freedesktop.org/mailman/listinfo/poppler" target="_blank">http://lists.freedesktop.org/mailman/listinfo/poppler</a><br>
><br>
><br>
<br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
"I like to pay taxes. With them, I buy civilization." -- Oliver Wendell Holmes<br>
</font></span></blockquote></div><br></div>