Hi all, My name is Justine Guillaumont, I am completing my engineering studies by a 6-months internship. I am working on the opensource project WebLab (<a href="http://weblab-project.org/" target="_blank">weblab-project.org</a>). I am currently using poppler-0.16.7 (I tried to install poppler-0.17.4 but libpoppler.so.17 is missing). One of the purposes of my internship is to transform PDF files into XHTML files that will give the same structured display. In order to doing this, I use pdftohtml -nodrm -p -s (to obtain HTML) and then a script and XSL (to obtain XHTML). I encountered several problems with pdftohtml that I would like to share in order to have your opinion. 1) Would it be possible to have the width and height of the tag DIV in the BODY ? I noticed that with have it with pdftohtml -xml (in the tags TEXT) but not with pdftohtml -nodrm -p -s. I tried to modifiy your code (HtmlOutputDev.cc) but I only "sucess" to collect the width and height of the first word of the DIV. 2) The HTML generate by pdftohtml is not validated by W3C (<a href="http://validator.w3.org/" target="_blank">http://validator.w3.org/</a>) It is sad because you don't have much to modify to obtain valid HTML 4 or XHTML. If you like, I can send you the xsl I made to transform the HTML generate by pdftohtml -p -s into valid HTML4. 3) With arabic PDF, pdftohtml seems to read correctely the PDF (from rigth to left) and to write the HTML upside-down / backwards (from left to right). All words are reversed. Would that be corrected soon ? Please find attached an example of this problem. Regards, Justine Guillaumont