[poppler] [Poppler] Bug in your text matching routine

Albert Astals Cid aacid at kde.org
Sun Sep 2 09:31:56 PDT 2007


A Divendres 31 Agost 2007, Ed Catmur va escriure:
> On Thu, 2007-08-30 at 19:45 +0200, Albert Astals Cid wrote:
> > A Dijous 30 Agost 2007, James Cloos va escriure:
> > > >>>>> "Ed" == Ed Catmur <ed at catmur.co.uk> writes:
> > >
> > > Ed> Question: where do we want to draw the match box when a
> > > Ed> search /partially/ matches a compatibility decomposition?
> > >
> > > Ed> 1. at the end of the compatibility character
> > > Ed> 2. exactly halfway through the compatibility character
> > > Ed> 3. as far through as the match constitutes of the compatibility
> > > Ed> decomposition (e.g. 2/3 through when matching 'ff' of FFI LIGATURE)
> > >
> > > Ed> 3. seems the most elegant, but could be a little complex to
> > > implement Ed> and may not always be the right solution (RTL, zero-width
> > > characters, etc.)
> > >
> > > I'd vote for getting 1 in for now and only then spending any time on
> > > implementing 3.  It may even be the better option overall.
> > >
> > > As you say, 3 will be quite complex when dealing with the scripts which
> > > require shaping engines or syllable-per-glyph scripts like Hangeul, if
> > > you allow searching for syllable components.
> > >
> > > With some of the scripts you would even need disjoint match boxes.
> > >
> > > Even in cases where the syllable block isn't a single glyph it might be
> > > better to highlight the whole thing rather than just the matched
> > > pieces.
> >
> > I'm with James here, go for 1 and then for 3 if you feel powerful :D
>
> No, you're right; without information on the layout of subglyphs in
> compatibility characters trying to implement 3 is pointless (and perhaps
> even then).
>
> Here's the patch for 1.

Thanks, patch is in :-)

Albert

>
> Ed




More information about the poppler mailing list