[Harfbuzz-indic] unicode -> glyph id resolver (Re: Malayalam rendering with latest Harfbuzz)

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Wed Aug 3 03:02:02 PDT 2011


On Wed, 03 Aug 2011 17:22:48 +0900
suzuki toshiya <mpsuzuki at hiroshima-u.ac.jp> wrote:

>Hi,
>
>Bernard Massot wrote:
>> On Wed, Aug 03, 2011 at 12:47:12PM +0530, Pravin Satpute wrote:
>>> I think good to add these test cases in test-complex-shape.c
>>> like
>>>    { 0x0915, 0x094d, 0 },  -> Unicode
>>>       { 0x0080, 0x0051, 0 } -> Expected glyph ids from fonts
>>>
>>> Behdad any quick trick to get glyphs ids from fonts?
>> Here is my non-quick trick : dump font in XML format with the "ttx"
>> program and look for your glyphs in "id" attributes of <GlyphID> tags in
>> the generated .ttx file. Then you have to convert them to hexadecimal.
>> 
>> I'm interested in a more productive way to achieve that.
>
>Excuse me, what required is a tool converting an Unicode text to
>a serie of glyph IDs for a given font, something like:
>
>$ ./get-gids-of-font-by-unicode-str.exe test_font.ttf < sample.utf8
>gid128
>gid81
>...
>
># There might be some discussion if the input like "U+xxxx" is better
># or raw Unicode text is better.
>
>If the glyph IDs with no consideration of OpenType layout are sufficient,
>it is not so difficult to make such tool with FreeType2. I will try.

Like this... there might be some bug in UTF-8 parser.

/*
 *
 * cc -o make-gids-from-font-and-utf8.exe make-gids-from-font-and-utf8.c \
 *   `freetype-config --cflags` `freetype-config --libs`
 *
 * echo "Hello World" \
 * | make-gids-from-font-and-utf8.exe LiberationMono-Regular.ttf
 *
 * U+0048 -> gid43
 * U+0065 -> gid72
 * U+006C -> gid79
 * U+006C -> gid79
 * U+006F -> gid82
 * U+0020 -> gid3
 * U+0057 -> gid58
 * U+006F -> gid82
 * U+0072 -> gid85
 * U+006C -> gid79
 * U+0064 -> gid71
 * U+000A -> gid0
 *
 * written by mpsuzuki at hiroshima-u.ac.jp
 *
 */


#include <stdio.h>
#include <ft2build.h>
#include FT_FREETYPE_H



int main( int     argc,
          char**  argv )
{
  FT_Error    error;
  FT_Library  library;
  FT_Face     face;
  int         i, c;


  error = FT_Init_FreeType( &library );
  if ( error )
    exit( -2 );

  if ( argc < 2 )
  {
    fprintf( stderr, "1 argument (to specify a font) is required\n" );
    exit( -3 );
  }


  if ( FT_Err_Ok != ( error = FT_New_Face( library, argv[1], 0, &face ) ) )
  {
    fprintf( stderr, "FT2 could not open a face from %s, error code = %d\n", argv[1], error );
    exit( -4 );
  }

  if ( face->charmap->encoding != FT_ENCODING_UNICODE )
  {
    fprintf( stderr, "FT2 could not find Unicode cmap in %s\n", argv[1] );
    exit( -5 );
  }


  while ( EOF != ( c = fgetc( stdin ) ) )
  {
    int   left = 0;
    long  ucs  = 0;


    if ( c < 0x80 ) 
      ucs = c;
    else if ( c < 0xC0 )
      exit( -6 );

    if ( 0xBF < c )
      ucs = ( c & 0x1F ) <<  6, left = 1;

    if ( 0xDF < c )
      ucs = ( c & 0x0F ) << 12, left = 2;

    if ( 0xEF < c )
      ucs = ( c & 0x07 ) << 18, left = 3;
 
    if ( 0xF7 < c )
      ucs = ( c & 0x03 ) << 24, left = 4;

    if ( 0xFB < c )
      ucs = ( c & 0x01 ) << 30, left = 5;

    if ( 0xFD < c )
      exit( -6 );


    for ( ; left > 0 && EOF != ( c = fgetc( stdin ) ); left -- )
    {
      if ( c < 0x80 || 0xBF < c )
        exit( -7 );

      ucs += ( ( c & 0x3F ) << ( 6 * ( left - 1 ) ) ) ;
    }

    printf( "U+%04X -> gid%d\n", ucs, FT_Get_Char_Index( face, ucs ) );
  }


  printf( "\n" );

  if ( FT_Err_Ok != ( error = FT_Done_Face( face ) ) )
  {
    fprintf( stderr, "FT2 failed to close a face, error code = %d\n", error );
    exit( -99 );
  }

  if ( FT_Err_Ok != ( error = FT_Done_FreeType( library ) ) )
  {
    fprintf( stderr, "FT2 failed to close a library, error code = %d\n", error );
    exit( -100 );
  }

  exit( 0 );
}


More information about the HarfBuzz-Indic mailing list