[poppler] pdftohtml (width-height and Arabic pdf)

Josh Richardson jric at chegg.com
Sat Sep 24 00:08:56 PDT 2011


Sorry for the delay — been on an airplane all day — and had a lot of emails to read on the list.  ;-)

1)  You can use both –s and –c at the same time.
2) Ok, was worth a shot.  I've lost track a little bit where the code base is — I haven't yet contributed back everything, just because it takes time to format the patches.  I definitely have code that embeds the size of each paragraph — well, at least I think it's what you want.  I've attached a sample file — let me know.
3) I'm a little surprised, but yes, I confirmed that the Arabic shows up in the wrong direction even in my version.  Looks like we'll need to do some work to make it handle right-to-left text correctly.  If you want to write the patch, contact me off-list and I'll try and help you do it.

--josh

From: Justine Guillaumont <justine.guillaumont at gmail.com<mailto:justine.guillaumont at gmail.com>>
Date: Fri, 23 Sep 2011 04:35:52 -0700
To: "poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>" <poppler at lists.freedesktop.org<mailto:poppler at lists.freedesktop.org>>
Subject: [poppler] pdftohtml (width-height and Arabic pdf)

Hi,

It seems that the subject from my fisrt email has diverged... I open this new subject to let you finish your conversation on the other.

Thank you for your advice Josh. I finally succed to built the latest version of the GIT ! But my problems are the same...

1) pdftohtml -c generate indeed xhtml but I prefer the display of pdftohtml -s (all the pages in one html). I will keep (and modify) my xsl to obtain xhtml with pdftohtml -s

2) the <div> I was talking about (in version 0.16.7) has been replace by <p> in the lastest version, and they don't contain width and height either...
Example : <P style="position:absolute;top:2187px;left:364px;white-space:
nowrap" class="ft01">

3) I tryed severals arabic pdf with the lastest version and I did obtain the same results (with pdftohtml -c and pdftohtml -s) : all the text is backwards (see enclusure). Do have one arabic pdf that has a good rendering ?

Justine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110924/5dd9fd7f/attachment-0001.htm>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110924/5dd9fd7f/attachment-0001.html>


More information about the poppler mailing list