<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_REOPENED " title="REOPENED --- - Ligated characters are drawn multiple times when selected" href="https://bugs.freedesktop.org/show_bug.cgi?id=9001#c17">Comment # 17</a> on <a class="bz_bug_link bz_status_REOPENED " title="REOPENED --- - Ligated characters are drawn multiple times when selected" href="https://bugs.freedesktop.org/show_bug.cgi?id=9001">bug 9001</a> from <a class="email" href="mailto:carlosgc@gnome.org" title="Carlos Garcia Campos <carlosgc@gnome.org>"> Carlos Garcia Campos</a> <pre>(In reply to <a href="show_bug.cgi?id=9001#c16">comment #16</a>) > (In reply to <a href="show_bug.cgi?id=9001#c15">comment #15</a>) > > Patch works. I wonder if we should return the ligatures as a single > > character instead so that we don't need a special case. > > We currently support selecting individual characters within a ligature; it'd > be a shame to lose that. You are right. > > ::: poppler/TextOutputDev.cc > > @@ +2392,4 @@ > > > w1 /= uLen; > > > h1 /= uLen; > > > for (i = 0; i < uLen; ++i) { > > > + if (i > 0) c = CHARCODE_LIGATED; > > > > Could you explain why this means it's a ligature? > > uLen is greater than 1 when a single CharCode (i.e. a glyph) signifies > multiple Unicode codepoints. In English text, that typically occurs when > the glyph is a ligature. In other scripts that might not be the case > (<a href="http://unicode.org/Public/UNIDATA/NamedSequences.txt">http://unicode.org/Public/UNIDATA/NamedSequences.txt</a>) but if so we don't > handle those correctly anyway; by chopping up the space occupied by the > glyph we're assuming it's a ligature. Thanks for the explanation. > Maybe a better name like CHARCODE_GLYPH_CONTINUATION would be clearer? Ok, I think I understand better the problem now. The unicode of words is normalized, but charcode of caharcters corresponds to the ligature, so that we render ligatures, but extracted text and characters are split. This means that for split characters we have the same charcode, but different text and bounding box. This allows us to select individual characters of a ligature or search for individual characters as well. Is this right? Assuming it's right, what is common in all characters of a ligature while rendering is the charcode and the transformation matrix, so I wonder if we could generalize it and avoid rendering the same character twice always with something liked this: if (i > begin && sel->word->charcode[i - 1] == sel->word->charcode[i] && sel->word->textMat[i - 1].m[4] == sel->word->textMat[i].m[4] && sel->word->textMat[i - 1].m[5] == sel->word->textMat[i].m[5]) continue; Your patch works fine, but changing the char looks a bit like a hack to me.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>