[Fontconfig] fc-glyphname bug
James Cloos
cloos at jhcloos.com
Thu Mar 8 02:15:36 PST 2007
>>>>> "Keith" == Keith Packard <keithp at keithp.com> writes:
>> With only the dingbat names added in fcglyphlist,
>> FC_GLYPHNAME_MAXLEN is 4, and passing 5 to FT_Get_Glyph_Name()
>> causes a loop.
Keith> So FT_Get_Glyph_Name loops? Or we continue to call it with the
Keith> same data even though it returns an error?
I was so exhausted this afternoon, that after a good day's sleep
everything has run together and is a bit foggy. But after reviewing
the long post (crossed to freetype-devel), fc calls FT_Get_Glyph_Name
in a loop with glyph_index (the 2nd arg) having values of:
0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, ...
With the buffer long enough to load all of the glyph names it instead
increments from 0 to face->num_glyphs as expected.
This happens in FcFreeTypeCharSetAndSpacing().
When FT_Get_Glyph_Name fails it returns (FT_Error) 6.
I think I might see the problem.
ft_mem_strcpyn() looks like this:
,----(freetype2/src/base/ftutil.c)
| FT_BASE_DEF( FT_Int )
| ft_mem_strcpyn( char* dst,
| const char* src,
| FT_ULong size )
| {
| while ( size > 1 && *src != 0 )
| *dst++ = *src++;
|
| *dst = 0; /* always zero-terminate */
|
| return *src != 0;
| }
`----
Unless I'm missing something, the copy is not limited to size octets,
yes? That probably writes over the value in FcFreeTypeCharSetAndSpacing()'s
glyph variable, if name_buf[] happens to be allocated just before it.
I bet that is what causes the loop.
Ensuring that the buffer is longer than the glyph names ensures that
ft_mem_strcpyn() cannot overstep and therefore avoids the loop.
I'll post that bit to freetype-dev.
>> The largest legal value [FC_GLYPHNAME_MAXLEN] could need in a
>> PostScript font is 127, so name_buf[128] should be a sufficient
>> initialization.
Keith> And this seems like a fine work-around; as you can see, these
Keith> buffers are just allocated on the stack. Pushing the actual bug
Keith> upstream to FreeType should fix the root cause eventually.
In that case, I presume something like:
diff --git a/fc-glyphname/fc-glyphname.c b/fc-glyphname/fc-glyphname.c
index a0e18e7..c2db931 100644
--- a/fc-glyphname/fc-glyphname.c
+++ b/fc-glyphname/fc-glyphname.c
@@ -282,7 +282,7 @@ main (int argc, char **argv)
printf ("#define FC_GLYPHNAME_HASH %u\n", hash);
printf ("#define FC_GLYPHNAME_REHASH %u\n", rehash);
- printf ("#define FC_GLYPHNAME_MAXLEN %d\n\n", max_name_len);
+ printf ("#define FC_GLYPHNAME_MAXLEN 127\n\n");
/*
* Dump out entries
as well as the version you posted to keep i+=r from being constant in
fc-glyphname.c:insert() would be enough to avoid the bug until ft is
fixed to honour the size arg to ft_mem_strcpyn()?
Keith> Does Standard Symbol L also provide a regular encoding for the
Keith> glyphs that it uses? The list you provide looks a lot like the
Keith> standard Adobe encoding for text fonts. With your buffer size
Keith> fix in place, does this font start working?
It isn't the text encoding. Symbol encoding gets its own table in the
PLRM (even back to the original Red Book). URW's version just adds
Euro encoded at 0x80.
I currently have fc installed with the two patches I posted, so it has
the full glyphlist table. With that version, xfd(1x) shows the glyphs
in their unicode codepoints for Standard Symbol L, just like it does
for ITC Zapf Dingbats. The same holds for Symbol:foundry=adobe.
That does seem like the right thing to do, yes?
>> Is the answer to add just those 189 glyph names rather than all of
>> the names in glyphlist.txt?
Keith> Certainly using a small subset of the glyph names would be
Keith> preferred to including all of them in the current data
Keith> structure form.
The list I posted last is exactly the glyph names needed, using the
code points in the glyphlist.txt file in fc-glyphname. But I think
some of those are outdated. The glyphs put in 0xF8XX:
radicalex;F8E5 arrowvertex;F8E6 arrowhorizex;F8E7
registersans;F8E8 copyrightsans;F8E9 trademarksans;F8EA
parenlefttp;F8EB parenleftex;F8EC parenleftbt;F8ED
bracketlefttp;F8EE bracketleftex;F8EF bracketleftbt;F8F0
bracelefttp;F8F1 braceleftmid;F8F2 braceleftbt;F8F3
braceex;F8F4 integralex;F8F5 parenrighttp;F8F6
parenrightex;F8F7 parenrightbt;F8F8 bracketrighttp;F8F9
bracketrightex;F8FA bracketrightbt;F8FB bracerighttp;F8FC
bracerightmid;F8FD bracerightbt;F8FE
now have codepoints in 10646.
At least these changes are needed:
arrowvertex;23D0 arrowhorizex;23AF
parenlefttp;239B parenleftex;239C parenleftbt;239D
bracketlefttp;23A1 bracketleftex;23A2 bracketleftbt;23A3
bracelefttp;23A7 braceleftmid;23A8 braceleftbt;23A9
braceex;23AA integralex;23AE parenrighttp;239E
parenrightex;239F parenrightbt;23A0 bracketrighttp;23A4
bracketrightex;23A5 bracketrightbt;23A6 bracerighttp;23AB
bracerightmid;23AC bracerightbt;23AD
My understanding is that Adobe put apple, radicalext and
the .serif versions (rather than the sans) of the copyright,
trademark and registered glyphs in the PUA in SymbolStd.otf.
Also arrowvertex, even though that is what U+23D0 is defined
to be.
With these entries added to fc-glyphname's loaded table the type1
versions of the fonts should show up just like the otf does.
Keith> I would not be averse to including all of them
Keith> if we built a data structure that did not use relocations
Keith> though. fontconfig has several large tables which have been
Keith> carefully designed to eliminate relocations; another one would
Keith> not be a terrible plan.
That is one programming exercise I've not tried.
>> Adobe Symbol
Keith> Does fontconfig not currently correctly construct the set of
Keith> Unicode code points supported by this font?
I don't know. I only have the one box to test on, and I'd like to
avoid backtracking to find out. It should be exactly the same as
for Standard Symbol L, though, so what do you get from:
xfd -fa 'Standard Symbol L'
More than 38 glyphs on the first page? Are the greek letters in
the 0300 page? If yes+no, then it does not get the code points
correct w/o additions to fcglyphname.h.
-JimC
More information about the Fontconfig
mailing list