[HarfBuzz] Don't render control characters?

Behdad Esfahbod behdad at behdad.org
Wed Mar 19 16:04:37 PDT 2014


On 14-03-06 02:20 PM, Richard Wordingham wrote:
> On Thu, 6 Mar 2014 22:38:07 +0200
> Konstantin Ritt <ritt.ks at gmail.com> wrote:
> 
>> Did you meet any single font with glyph for U+0008 (BS)? Honestly, I
>> don't imagine what U+0008 glyph representation looks like :)
> 
> Back one space!  The underlining in Unix man pages is usually
> implemented as overstrike via U+0008.  However, I'd forgotten that
> OpenType advance widths can't be negative, so it can't be rendered
> under font control.
>  
>> GC=Cc aren't really a characters but a control codes; some of them are
>> historically very common in use, though.
> 
>> As for TAB, certain fonts don't have nbsp and/or tab, and most fonts
>> don't have line separator, paragraph separator, and many other space
>> characters. Since we've touched this topic...in my opinion, HarfBuzz
>> should take care of (quite common) issue with missing glyphs for
>> characters of property White_Space [1].
>> For any of them, a fallback to U+0020 (SPACE) should be enough [2],
>> though a more sophisticated mechanism would also take care of glyph
>> advances [3], making U+000A..U+000D, U+0085, U+2028..U+2029 occupy no
>> space, 
> 
> We weren't asked about fallback, but about simply not rendering them.
> There are some nice-looking glyphs around for some of them, but perhaps
> these belong to the characters U+24xx SYMBOL FOR....

Humm..  Ok, I don't think substituting U+24xx should happen in HarfBuzz.

Also, Unicode says GC=Cc should just render as boxed if not supported.  The
reason we want them removed here is really an artifact of the HTML spec.  As
such, I think it makes sense to have them preserved by default, but possible
to remove them.  That is in contrast to Default_Ignorables which should be
removed by default, but possible to preserve.

WDYT?

I also don't like the API implication of this.  I can go two ways:

  - Have PRESERVE_DEFAULT_IGNORABLES and REMOVE_CONTROL_CHARACTERS,

  - Have PRESERVE_DEFAULT_IGNORABLES and PRESERVE_CONTROL_CHARACTERS, but make
DEFAULT include one but not the other.

Of if we are going the DEFAULT!=0 case, then perhaps makes more sense to have
REMOVE_DEFAULT_IGNORABLES and REMOVE_CONTROL_CHARACTERS, one enabled by
default and one disabled.  I think I feel most comfortable this way.

But then perhaps I should change this enum also:

typedef enum { /*< flags >*/
  HB_BUFFER_SERIALIZE_FLAG_DEFAULT              = 0x00000000u,
  HB_BUFFER_SERIALIZE_FLAG_NO_CLUSTERS          = 0x00000001u,
  HB_BUFFER_SERIALIZE_FLAG_NO_POSITIONS         = 0x00000002u,
  HB_BUFFER_SERIALIZE_FLAG_NO_GLYPH_NAMES       = 0x00000004u
} hb_buffer_serialize_flags_t;

I think I want to remove the _NO, and make DEFAULT non-zero here.

Thoughts?

behdad

-- 
behdad
http://behdad.org/


More information about the HarfBuzz mailing list