[poppler] pdftohtml does not preserve fonts
Clément Wehrung
cwehrung at gmail.com
Wed Oct 26 05:35:35 PDT 2011
You can understand better the issue here (Firefox vs Safari on Mac/iOS)
http://dev.nurves.com/pdf2html/-6.html
Cf. footnotes
WebKit.png (http://cl.ly/3c1B2V1X2u2C2f0M2L0L)
Firefox.png (http://cl.ly/0Q111C3u2g3T2U1D3U2u)
--
Clément Wehrung
06 88 10 65 91
Le mercredi 26 octobre 2011 à 14:26, Clément Wehrung a écrit :
> Hi Josh,
>
> Thanks for all this. I'm already looking at the code now, but I've run into some issues with webkit rendering compared to Firefox (where it looks really amazing !). I know webkit has a bug with letter-spacing (does not take decimal into account) but there's more to it since text-rendering:optimizeLegibility; only partly works. I try to see how we could get text boxes not to end up one over the other. I can show you some screenshots if you want.
>
> btw, when have you chosen not to use only the background image for all graphics ? is it in order to achieve some image over text ?
>
> Thanks,
>
> Clement
>
> --
> Clément Wehrung
> 06 88 10 65 91
>
> Le mardi 25 octobre 2011 à 00:41, Josh Richardson a écrit :
>
> > Ok, sent you a read-only access invitation for now. Thanks for your offer to help. Here is my bigger issues list to get a flavor – a lot of fun things to do. Let me know what you want to do with pdftohtml!
> >
> > Translate drawing operations into canvas with SVG
> > Find better way to calculate vertical positioning, by looking at browser source code
> > z-index handling -- currently text is never masked by graphics
> > Algorithmic extraction of TOC
> > Algorithmic extraction of page numbering (Alec may be working on this)
> > Algorithmic identification of chapters
> > Right-to-left text, proper display (e.g. Arabic, Hebrew)
> > Algorithmic detection of text flow (Stephen may be working on this)
> > Detection / removal of duplicate images
> > Jpg vs. png selection; automatically choose the best format for each image
> >
> >
> > --josh
> >
> > From: Clément Wehrung <cwehrung at nurves.com (mailto:cwehrung at nurves.com)>
> > Date: Mon, 24 Oct 2011 15:27:23 -0700
> > To: Josh Richardson <jric at chegg.com (mailto:jric at chegg.com)>
> > Cc: "poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)" <poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)>, Alec Taylor <alec.taylor6 at gmail.com (mailto:alec.taylor6 at gmail.com)>
> > Subject: Re: [poppler] pdftohtml does not preserve fonts
> >
> > Sure ! Do you have a link for the repo so that I can already have a look (I didn't figure out which one it is right now) ? I'm really interested in helping you, if you need something on any specific topic don't hesitate. Many thanks again,
> >
> > Clément
> >
> >
> > On Mon, Oct 24, 2011 at 8:01 PM, Josh Richardson <jric at chegg.com (mailto:jric at chegg.com)> wrote:
> > > Can you give me a couple of days? I want to try to get a repo hosted on,
> > > e.g. bitbucket, which is connected to my repo, so that it's easier to keep
> > > everything in synch. Alec Taylor set up a repo there already, which you
> > > can use to get an immediate snapshot if needed.
> > >
> > > Best, --josh
> > >
> > > On 10/24/11 10:45 AM, "iclems" <cwehrung at nurves.Com (mailto:cwehrung at nurves.Com)> wrote:
> > >
> > > >
> > > >Dear Josh,
> > > >
> > > >Being working on a pdftohtml project which requires font preservation, I'd
> > > >be really interested in getting this too. Do you think it's possible ?
> > > >
> > > >Thanks,
> > > >
> > > >Clement
> > > >cwehrung at gmail.com (mailto:cwehrung at gmail.com)
> > > >
> > > >
> > > >Josh Richardson wrote:
> > > >>
> > > >> Preserving fonts is not integrated into the master repository yet. If
> > > >>you
> > > >> like, I can send you a patched version of Poppler which will do it.
> > > >> You'll still have to run your own process (like Fontforge) to convert
> > > >>the
> > > >> fonts into a web-usable format, but it's straightforward as long as the
> > > >> fonts have mapping to unicode, and doable even without.
> > > >>
> > > >> --josh
> > > >>
> > > >> From: M Naveed Akram <cmnajs at gmail.com (mailto:cmnajs at gmail.com)<mailto:cmnajs at gmail.com>>
> > > >> Date: Fri, 30 Sep 2011 06:52:14 -0700
> > > >> To:
> > > >>"poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)<mailto:poppler at lists.freedesktop.org>"
> > > >> <poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)<mailto:poppler at lists.freedesktop.org>>
> > > >> Subject: [poppler] pdftohtml does not preserve fonts
> > > >>
> > > >> Hi,
> > > >>
> > > >> I have been using 0.16 release of poppler-utils, but I am facing a
> > > >> problem. When converting pdf to html using pdftohtml it does not
> > > >>preserve
> > > >> fonts in the output html. How can I solve this issue. Please help
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> poppler mailing list
> > > >> poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)
> > > >> http://lists.freedesktop.org/mailman/listinfo/poppler
> > > >>
> > > >>
> > > >
> > > >--
> > > >View this message in context:
> > > >http://old.nabble.com/pdftohtml-does-not-preserve-fonts-tp32569116p3271208
> > > >4.html
> > > >Sent from the Free Desktop - poppler mailing list archive at Nabble.com (http://Nabble.com).
> > > >
> > > >_______________________________________________
> > > >poppler mailing list
> > > >poppler at lists.freedesktop.org (mailto:poppler at lists.freedesktop.org)
> > > >http://lists.freedesktop.org/mailman/listinfo/poppler
> > > >
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111026/cab0dce7/attachment-0001.htm>
More information about the poppler
mailing list