<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>What makes you think that the ToUnicode table for that font is bad? It may not be what you expect, but that doesn't make it bad. For all you know, that is the information in the original font…</div><div><br></div><div>Leonard</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> ¤ýíi <<a href="mailto:coolwanglu@gmail.com">coolwanglu@gmail.com</a>><br><span style="font-weight:bold">To: </span> "<a href="mailto:mpsuzuki@hiroshima-u.ac.jp">mpsuzuki@hiroshima-u.ac.jp</a>" <<a href="mailto:mpsuzuki@hiroshima-u.ac.jp">mpsuzuki@hiroshima-u.ac.jp</a>><br><span style="font-weight:bold">Cc: </span> "<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>" <<a href="mailto:poppler@lists.freedesktop.org">poppler@lists.freedesktop.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [poppler] About parseCharName in GfxFont.cc<br></div><div><br></div><div><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div>
I tried to send the files through attachment, but got rejected from the mailling list<br><br>
The pdf can be found at <a href="http://dl.dropbox.com/u/75853179/med-9.pdf">http://dl.dropbox.com/u/75853179/med-9.pdf</a><br><br>
Please check the 'LEKSJON' on the top left corner, without ToUnicode map you should get the correct characters.<br><br>
btw, if you try to extract fonts using fontforge, it won't apply ToUnicode for non-ttf fonts.<br><br><br>
- Lu<br><br><div class="gmail_quote">On Fri, Aug 24, 2012 at 9:33 AM, ¤ýíi <span dir="ltr"><<a href="mailto:coolwanglu@gmail.com" target="_blank">coolwanglu@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I've attached a problematic pdf, notice the 'LEKSJON' in the top left corner, if you copy the text out, you'll get LeKSjoN<br>
So in the ToUnicode map for that font, both 'E' and 'e' are mapped to 'e'<br><br>
I've extracted the font as 'f2.cff' attached. The font itself is ok.<br>
I've also attached a file showing the font->getToUnicode(), the format for each line is<br><br>
GlyphID Unicode [Unicode...] # CharCode<br><br>
You can see problem at lines of 0x45 and 0x65.<br><br>
Thanks<br><br>
- Lu Wang
<div class="HOEnZb"><div class="h5"><br><br><br><div class="gmail_quote">On Fri, Aug 24, 2012 at 9:21 AM, suzuki toshiya <span dir="ltr">
<<a href="mailto:mpsuzuki@hiroshima-u.ac.jp" target="_blank">mpsuzuki@hiroshima-u.ac.jp</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>¤ýíi wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Usually this is done by ToUnicode map, but I've many bad mapping for<br>
Type 1 font, where Type 1 font itself provides good mappings.<br></blockquote><br></div>
Could you give some concrete examples?<br><br>
Regards,<br>
mpsuzuki<br></blockquote></div><br></div></div></blockquote></div><br></div></div></span></body></html>