[poppler] Find usign RegEx on top of xpdf-3.03

wwwacky at free.fr wwwacky at free.fr
Thu Oct 6 09:17:11 PDT 2011


Quoting Albert Astals Cid <aacid at kde.org>:

> A Dimecres, 5 d'octubre de 2011, wwwacky at free.fr vàreu escriure:
> > Dear all,
>
> Hi
>
> >
> > for some months I had the need for a regex find to dig out into huge pdf
> > docs. Please find a patch attached that implements this feature on top of
> > xpdf-3.03. It support ASCII only, backward and case-sensitive searches
> > (word-only check-box has no effect any more). The xpdf MMI haven't been
> > modified so that you can only perform regex searches with this patch!
> > I saw that xpdf-3.03 is being merge in Poppler. Hope that it could help to
> > make a review :)
> > Let me know if you are interested in this patch so that I can help to merge
> > it in Poppler.
>
> We still have not merged xpdf-3.03 and it will probably still take a while,
> but anyways i am not sure ASCII only is a good idea. Why that limitation?
>
> Albert

Hi,

In fact, this basic implementation relies on POSIX regex functions regcomp,
regexec, regerror, regfree. These functions takes char strings and not Unicode
strings in input. Thus, ASCII control chars and ASCII printable chars can be
matched. Supporting Unicode-compatible regex search is much eavy to implement
and out of my scope for the time being. I would like to support much more but I
forecast a huge effort to gain Unicode. Morerover, ASCII matches 99% of my need
in term of search in English data-sheets :)

I know that this patch has some weaknesses but I think it can be great to get
regex search in some applications such as Evince of which is gui _ according to
me _ smarter than xpdf one.

Best regards
Jerry

PS: Sorry for my poor English and my clumsy proposal :)
>
> >
> > Best regards
> > Jerry
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>




More information about the poppler mailing list