[poppler] approches used for language detection on images ...

Albretch Mueller lbrtchx at gmail.com
Thu Feb 6 11:29:52 UTC 2020


On 2/4/20, John Muccigrosso <muccigrosso at icloud.com> wrote:
> Tesseract can do multiple languages in one file. Try “-l eng+ita” for
> example.

 Well, yes, but what can you do when you don't know the language on
which the other text might appear?

 Say, the French expression "pied à terre" is used but someone lousily
writes it as "pied a terre" "pied" is an English word and "terre"
could be OCR'ed as "terse"

 I do work on texts (mostly about philosophy) which include lots of
Latin and French Terms.

 lbrtchx


More information about the poppler mailing list