[poppler] [Poppler] Bug in your text matching routine

Ed Catmur ed at catmur.co.uk
Wed Aug 29 14:27:53 PDT 2007


On Mon, 2007-08-27 at 20:38 +0200, Albert Astals Cid wrote:
> A Dilluns 27 Agost 2007, Ed Catmur va escriure:
> > On Sun, 2007-08-26 at 21:55 +0200, Albert Astals Cid wrote:
> > > The problem is, that searching "a" in the attached document (that only
> > > contains "ä") returns true but the returned container rectangle is 0
> > > pixels width.
> > Oops.  Stupid error, patch attached.
> That was fast! Thanks a lot. You rock :-)

Unfortunately that's not the end of the story.

Create a pdf containing the following (OOo Writer works well):

	Offler's offer of offices offended.
(Note the use of ff, ffi, ffl ligatures; this also happens with
fractions (½), etc.)

Now search for of, off, offi, offl, f, ff, ffi, ffl etc.

Question: where do we want to draw the match box when a
search /partially/ matches a compatibility decomposition?  The current
code evidently draws it at the start of the compatibility character;
other options I've thought of (in order of increasing complexity) are:
1. at the end of the compatibility character
2. exactly halfway through the compatibility character
3. as far through as the match constitutes of the compatibility
decomposition (e.g. 2/3 through when matching 'ff' of FFI LIGATURE)

3. seems the most elegant, but could be a little complex to implement
and may not always be the right solution (RTL, zero-width characters,
etc.)

Thoughts?

Ed



More information about the poppler mailing list