[Poppler-bugs] [Bug 9001] Ligated characters are drawn multiple times when selected
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sat Jul 20 04:46:18 PDT 2013
https://bugs.freedesktop.org/show_bug.cgi?id=9001
--- Comment #17 from Carlos Garcia Campos <carlosgc at gnome.org> ---
(In reply to comment #16)
> (In reply to comment #15)
> > Patch works. I wonder if we should return the ligatures as a single
> > character instead so that we don't need a special case.
>
> We currently support selecting individual characters within a ligature; it'd
> be a shame to lose that.
You are right.
> > ::: poppler/TextOutputDev.cc
> > @@ +2392,4 @@
> > > w1 /= uLen;
> > > h1 /= uLen;
> > > for (i = 0; i < uLen; ++i) {
> > > + if (i > 0) c = CHARCODE_LIGATED;
> >
> > Could you explain why this means it's a ligature?
>
> uLen is greater than 1 when a single CharCode (i.e. a glyph) signifies
> multiple Unicode codepoints. In English text, that typically occurs when
> the glyph is a ligature. In other scripts that might not be the case
> (http://unicode.org/Public/UNIDATA/NamedSequences.txt) but if so we don't
> handle those correctly anyway; by chopping up the space occupied by the
> glyph we're assuming it's a ligature.
Thanks for the explanation.
> Maybe a better name like CHARCODE_GLYPH_CONTINUATION would be clearer?
Ok, I think I understand better the problem now. The unicode of words is
normalized, but charcode of caharcters corresponds to the ligature, so that we
render ligatures, but extracted text and characters are split. This means that
for split characters we have the same charcode, but different text and bounding
box. This allows us to select individual characters of a ligature or search for
individual characters as well. Is this right? Assuming it's right, what is
common in all characters of a ligature while rendering is the charcode and the
transformation matrix, so I wonder if we could generalize it and avoid
rendering the same character twice always with something liked this:
if (i > begin &&
sel->word->charcode[i - 1] == sel->word->charcode[i] &&
sel->word->textMat[i - 1].m[4] == sel->word->textMat[i].m[4] &&
sel->word->textMat[i - 1].m[5] == sel->word->textMat[i].m[5])
continue;
Your patch works fine, but changing the char looks a bit like a hack to me.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130720/4ff950f0/attachment.html>
More information about the Poppler-bugs
mailing list