[poppler] [Poppler] Bug in your text matching routine

Ed Catmur ed at catmur.co.uk
Fri Aug 31 11:33:43 PDT 2007


On Thu, 2007-08-30 at 19:45 +0200, Albert Astals Cid wrote:
> A Dijous 30 Agost 2007, James Cloos va escriure:
> > >>>>> "Ed" == Ed Catmur <ed at catmur.co.uk> writes:
> > Ed> Question: where do we want to draw the match box when a
> > Ed> search /partially/ matches a compatibility decomposition?
> >
> > Ed> 1. at the end of the compatibility character
> > Ed> 2. exactly halfway through the compatibility character
> > Ed> 3. as far through as the match constitutes of the compatibility
> > Ed> decomposition (e.g. 2/3 through when matching 'ff' of FFI LIGATURE)
> >
> > Ed> 3. seems the most elegant, but could be a little complex to implement
> > Ed> and may not always be the right solution (RTL, zero-width characters,
> > etc.)
> >
> > I'd vote for getting 1 in for now and only then spending any time on
> > implementing 3.  It may even be the better option overall.
> >
> > As you say, 3 will be quite complex when dealing with the scripts which
> > require shaping engines or syllable-per-glyph scripts like Hangeul, if
> > you allow searching for syllable components.
> >
> > With some of the scripts you would even need disjoint match boxes.
> >
> > Even in cases where the syllable block isn't a single glyph it might be
> > better to highlight the whole thing rather than just the matched pieces.
> 
> I'm with James here, go for 1 and then for 3 if you feel powerful :D

No, you're right; without information on the layout of subglyphs in
compatibility characters trying to implement 3 is pointless (and perhaps
even then).

Here's the patch for 1.

Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: highlight-full-glyph.patch
Type: text/x-patch
Size: 1596 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20070831/6ba21c6b/attachment.bin 


More information about the poppler mailing list