[poppler] How to copy the text other than english language pdf

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Thu Apr 7 07:46:32 PDT 2011


Hi,

The text extraction from PDF including CJK text in
horizontal writing mode is almost working. However,
right-to-left scripts (like Arabic, Hebrew) are not
supported yet, and the support for the scripts depending
complex text layout feature like Indic scripts is
not completed feature. In fact, the text extraction
for such scripts cannot be achieved by PDF renderer
only, either PDF data should include extra data to
translate the composed glyph to Unicode substring...
thus, please give a sample PDF for further discussion.

Regards,
mpsuzuki

On Thu, 7 Apr 2011 19:51:06 +0530
srinivas adicherla <srinivas.adicherla at gmail.com> wrote:

>Hi,
>
>      I have one pdf other than english language. I want to copy the text
>from it. How can I do it? Presently iam using
>poppler_page_get_selected_text(). But its not working. How poppler is doing
>it?
>Please provide me the solution.
>
>Thanks
>--
>A Srinivas
>


More information about the poppler mailing list