Hi All,<br><div class="gmail_quote"><div><br></div><div>My name is Akash Agrawal and I am working on producing a full-fledged pdf to html solution. I investigated poppler and made a lot of custom changes for my requirement. I got your reference from revision log in pdfthtml source files. I will appreciate if you can address my queries. I am stuck at 2 issues currently:</div>
<div><ol><li>z-index</li><li>Fonts</li></ol></div><div><b>z-index:</b> In it's current solution, poppler's pdftohtml puts all the non-text data into an image and use this image as a background image in html. But at times, there are pdfs which have image/graphics over the text and current solution fails in such case. I looked into Gfx and OutputDevice code and didn't reach a good workable solution for this case. I will be highly indebted if you can suggest some pointers.</div>
<div><br></div><div><b>Fonts:</b> Fonts are the biggest problem here. I saw that currently, it outputs all fonts as Times (default font), so I fixed that with exact font names (with tag coz multiple versions of a same fonts might be present in pdf). I also made non-horizontal text as part of image coz rotating the glyphs were not a very good idea to me seeing the time in hand. I am also able to extract font data but facing difficulties to extract encoding info like cmap etc. Your pointers on the same will be very much appreciated. FYI I am using fontforge to convert extracted fonts in a common format (ttf in my case). I am thing to apply cmaps using fontforge. Please let me know if you suggest otherwise.</div>
<div><br></div><div>I am waiting for a positive response from your side regarding the same. Looking forward for a strong technical relationship.</div><div><br clear="all">Regards,<br>Akash Agrawal<br><a href="http://tech-queries.blogspot.com/" target="_blank">http://tech-queries.blogspot.com/</a><br>
</div>
</div><br>