Hi, I saw Ross' note about not being able to extract Chinese characters from certain PDFs and just wanted to mention that I've seen the same.  Unfortunately I am unable to share the PDFs, and from Ross' note I'm not quite sure how to check if it's the same problem.  But I can mention that I have seen this problem with other languages, even English sometimes, too.  Most frequently I've seen poor text extraction from PDFs in Thai, though some Thai PDFs do work.  I thought the problem might be a missing CMAP file but from your description it sounds like that might not be the case, is that correct?<br> <br>I have also seen some Arabic text that I have not been able to interpret correctly.  Arabic is written right to left, but when I open the XML from pdftohtml, the characters are reversed.  That is, instead of 1234567 it looks like 7654321.  Also, even after reversing the characters, I haven't quite been able to match them up with the text as it appears in the PDF.  Has anyone else seen this?  Or have a clue as to what I might be doing wrong?<br>