[poppler] Find usign RegEx on top of xpdf-3.03

wwwacky at free.fr wwwacky at free.fr
Fri Oct 28 08:29:07 PDT 2011


Quoting wwwacky at free.fr:

> Quoting Albert Astals Cid <aacid at kde.org>:
>
> > A Dijous, 6 d'octubre de 2011, wwwacky at free.fr vàreu escriure:
> > > Quoting Albert Astals Cid <aacid at kde.org>:
> > > > A Dimecres, 5 d'octubre de 2011, wwwacky at free.fr và reu escriure:
> > > > > Dear all,
> > > >
> > > > Hi
> > > >
> > > > > for some months I had the need for a regex find to dig out into huge
> > > > > pdf docs. Please find a patch attached that implements this feature
> > > > > on top of xpdf-3.03. It support ASCII only, backward and
> > > > > case-sensitive searches (word-only check-box has no effect any
> > > > > more). The xpdf MMI haven't been modified so that you can only
> > > > > perform regex searches with this patch! I saw that xpdf-3.03 is
> > > > > being merge in Poppler. Hope that it could help to make a review :)
> > > > > Let me know if you are interested in this patch so that I can help
> > > > > to merge it in Poppler.
> > > >
> > > > We still have not merged xpdf-3.03 and it will probably still take a
> > > > while, but anyways i am not sure ASCII only is a good idea. Why that
> > > > limitation?
> > > >
> > > > Albert
> > >
> > > Hi,
> > >
> > > In fact, this basic implementation relies on POSIX regex functions
> regcomp,
> > > regexec, regerror, regfree. These functions takes char strings and not
> > > Unicode strings in input. Thus, ASCII control chars and ASCII printable
> > > chars can be matched. Supporting Unicode-compatible regex search is much
> > > eavy to implement and out of my scope for the time being. I would like to
> > > support much more but I forecast a huge effort to gain Unicode.
> Morerover,
> > > ASCII matches 99% of my need in term of search in English data-sheets :)
> >
> > Sure, it might match your needs, but if you contribute it to poppler,
> people
> > will start demanding that it works with non ASCII characters and you will
> > probably not be here anymore and the burden will be on our side.
> >
> > Albert
> >
> > > I know that this patch has some weaknesses but I think it can be great to
> > > get regex search in some applications such as Evince of which is gui _
> > > according to me _ smarter than xpdf one.
> > >
> > > Best regards
> > > Jerry
> > >
> > > PS: Sorry for my poor English and my clumsy proposal :)
> > >
> > > > > Best regards
> > > > > Jerry
> > > >
> > > > _______________________________________________
> > > > poppler mailing list
> > > > poppler at lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/poppler
> > >
> > > _______________________________________________
> > > poppler mailing list
> > > poppler at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/poppler
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler
> >
>
> Hi Albert,
>
> To be more precise, the patch supports also extended ASCII 0x7F-0xFF as well
> as
> control chars 0x01-0x1F and printable chars 0x20-0x7F. This means that on my
> Ubuntu 10.04 I can input and find ASCII and all iso latin 1 chars
> (iso-8859-1)
> such as e acute 'é', a grave 'à' and so on. All other extended ASCII sets are
> supported according to your computer configuration and keyboard settings.
>
> I think that it covers not only my needs but also most of EMEA users' ones.
> RegEx search is  a well-known old feature for many editors and script
> language.
> This patch brings this powerful feature to xpdf and it can be a totally new
> on
> Poppler. Supporting only 1-byte charset encoding is more a restriction for
> APAC
> users than a bug.
>
> For instance, mind that you are searching a sentence beginning by "The "
> followed by any word and then by " is" you just have to type "The .* is"
> regex
> in find dialog box. Only regex offers this possibility and combinations are
> quiet infinite.
>
> Maybe may I push my modified xpdf binary so that you can test it?
>
> With best regards,
> Jerry
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
>


Hi All,

I just see Marc's mail in Poppler archive.
"[poppler] whole word search?"
It seems that he is almost the only one to refer to regex in pdf search engine
:(

There is a possibility to support regex over Unicode in Poppler (which is quiet
difficult with xpdf).
But I would like to know if there is some Poppler's contributers interested in.
In this case spending my time implementing a clean patch supporting Unicode will
be more reasonable. Else I will keep it for me ...

Best regards
Jerry


More information about the poppler mailing list