[Harfbuzz-indic] unicode -> glyph id resolver (Re: Malayalam rendering with latest Harfbuzz)
mpsuzuki at hiroshima-u.ac.jp
mpsuzuki at hiroshima-u.ac.jp
Wed Aug 3 03:02:02 PDT 2011
On Wed, 03 Aug 2011 17:22:48 +0900
suzuki toshiya <mpsuzuki at hiroshima-u.ac.jp> wrote:
>Hi,
>
>Bernard Massot wrote:
>> On Wed, Aug 03, 2011 at 12:47:12PM +0530, Pravin Satpute wrote:
>>> I think good to add these test cases in test-complex-shape.c
>>> like
>>> { 0x0915, 0x094d, 0 }, -> Unicode
>>> { 0x0080, 0x0051, 0 } -> Expected glyph ids from fonts
>>>
>>> Behdad any quick trick to get glyphs ids from fonts?
>> Here is my non-quick trick : dump font in XML format with the "ttx"
>> program and look for your glyphs in "id" attributes of <GlyphID> tags in
>> the generated .ttx file. Then you have to convert them to hexadecimal.
>>
>> I'm interested in a more productive way to achieve that.
>
>Excuse me, what required is a tool converting an Unicode text to
>a serie of glyph IDs for a given font, something like:
>
>$ ./get-gids-of-font-by-unicode-str.exe test_font.ttf < sample.utf8
>gid128
>gid81
>...
>
># There might be some discussion if the input like "U+xxxx" is better
># or raw Unicode text is better.
>
>If the glyph IDs with no consideration of OpenType layout are sufficient,
>it is not so difficult to make such tool with FreeType2. I will try.
Like this... there might be some bug in UTF-8 parser.
/*
*
* cc -o make-gids-from-font-and-utf8.exe make-gids-from-font-and-utf8.c \
* `freetype-config --cflags` `freetype-config --libs`
*
* echo "Hello World" \
* | make-gids-from-font-and-utf8.exe LiberationMono-Regular.ttf
*
* U+0048 -> gid43
* U+0065 -> gid72
* U+006C -> gid79
* U+006C -> gid79
* U+006F -> gid82
* U+0020 -> gid3
* U+0057 -> gid58
* U+006F -> gid82
* U+0072 -> gid85
* U+006C -> gid79
* U+0064 -> gid71
* U+000A -> gid0
*
* written by mpsuzuki at hiroshima-u.ac.jp
*
*/
#include <stdio.h>
#include <ft2build.h>
#include FT_FREETYPE_H
int main( int argc,
char** argv )
{
FT_Error error;
FT_Library library;
FT_Face face;
int i, c;
error = FT_Init_FreeType( &library );
if ( error )
exit( -2 );
if ( argc < 2 )
{
fprintf( stderr, "1 argument (to specify a font) is required\n" );
exit( -3 );
}
if ( FT_Err_Ok != ( error = FT_New_Face( library, argv[1], 0, &face ) ) )
{
fprintf( stderr, "FT2 could not open a face from %s, error code = %d\n", argv[1], error );
exit( -4 );
}
if ( face->charmap->encoding != FT_ENCODING_UNICODE )
{
fprintf( stderr, "FT2 could not find Unicode cmap in %s\n", argv[1] );
exit( -5 );
}
while ( EOF != ( c = fgetc( stdin ) ) )
{
int left = 0;
long ucs = 0;
if ( c < 0x80 )
ucs = c;
else if ( c < 0xC0 )
exit( -6 );
if ( 0xBF < c )
ucs = ( c & 0x1F ) << 6, left = 1;
if ( 0xDF < c )
ucs = ( c & 0x0F ) << 12, left = 2;
if ( 0xEF < c )
ucs = ( c & 0x07 ) << 18, left = 3;
if ( 0xF7 < c )
ucs = ( c & 0x03 ) << 24, left = 4;
if ( 0xFB < c )
ucs = ( c & 0x01 ) << 30, left = 5;
if ( 0xFD < c )
exit( -6 );
for ( ; left > 0 && EOF != ( c = fgetc( stdin ) ); left -- )
{
if ( c < 0x80 || 0xBF < c )
exit( -7 );
ucs += ( ( c & 0x3F ) << ( 6 * ( left - 1 ) ) ) ;
}
printf( "U+%04X -> gid%d\n", ucs, FT_Get_Char_Index( face, ucs ) );
}
printf( "\n" );
if ( FT_Err_Ok != ( error = FT_Done_Face( face ) ) )
{
fprintf( stderr, "FT2 failed to close a face, error code = %d\n", error );
exit( -99 );
}
if ( FT_Err_Ok != ( error = FT_Done_FreeType( library ) ) )
{
fprintf( stderr, "FT2 failed to close a library, error code = %d\n", error );
exit( -100 );
}
exit( 0 );
}
More information about the HarfBuzz-Indic
mailing list