[HarfBuzz] Don't render control characters?
Behdad Esfahbod
behdad at behdad.org
Wed Mar 19 16:04:37 PDT 2014
On 14-03-06 02:20 PM, Richard Wordingham wrote:
> On Thu, 6 Mar 2014 22:38:07 +0200
> Konstantin Ritt <ritt.ks at gmail.com> wrote:
>
>> Did you meet any single font with glyph for U+0008 (BS)? Honestly, I
>> don't imagine what U+0008 glyph representation looks like :)
>
> Back one space! The underlining in Unix man pages is usually
> implemented as overstrike via U+0008. However, I'd forgotten that
> OpenType advance widths can't be negative, so it can't be rendered
> under font control.
>
>> GC=Cc aren't really a characters but a control codes; some of them are
>> historically very common in use, though.
>
>> As for TAB, certain fonts don't have nbsp and/or tab, and most fonts
>> don't have line separator, paragraph separator, and many other space
>> characters. Since we've touched this topic...in my opinion, HarfBuzz
>> should take care of (quite common) issue with missing glyphs for
>> characters of property White_Space [1].
>> For any of them, a fallback to U+0020 (SPACE) should be enough [2],
>> though a more sophisticated mechanism would also take care of glyph
>> advances [3], making U+000A..U+000D, U+0085, U+2028..U+2029 occupy no
>> space,
>
> We weren't asked about fallback, but about simply not rendering them.
> There are some nice-looking glyphs around for some of them, but perhaps
> these belong to the characters U+24xx SYMBOL FOR....
Humm.. Ok, I don't think substituting U+24xx should happen in HarfBuzz.
Also, Unicode says GC=Cc should just render as boxed if not supported. The
reason we want them removed here is really an artifact of the HTML spec. As
such, I think it makes sense to have them preserved by default, but possible
to remove them. That is in contrast to Default_Ignorables which should be
removed by default, but possible to preserve.
WDYT?
I also don't like the API implication of this. I can go two ways:
- Have PRESERVE_DEFAULT_IGNORABLES and REMOVE_CONTROL_CHARACTERS,
- Have PRESERVE_DEFAULT_IGNORABLES and PRESERVE_CONTROL_CHARACTERS, but make
DEFAULT include one but not the other.
Of if we are going the DEFAULT!=0 case, then perhaps makes more sense to have
REMOVE_DEFAULT_IGNORABLES and REMOVE_CONTROL_CHARACTERS, one enabled by
default and one disabled. I think I feel most comfortable this way.
But then perhaps I should change this enum also:
typedef enum { /*< flags >*/
HB_BUFFER_SERIALIZE_FLAG_DEFAULT = 0x00000000u,
HB_BUFFER_SERIALIZE_FLAG_NO_CLUSTERS = 0x00000001u,
HB_BUFFER_SERIALIZE_FLAG_NO_POSITIONS = 0x00000002u,
HB_BUFFER_SERIALIZE_FLAG_NO_GLYPH_NAMES = 0x00000004u
} hb_buffer_serialize_flags_t;
I think I want to remove the _NO, and make DEFAULT non-zero here.
Thoughts?
behdad
--
behdad
http://behdad.org/
More information about the HarfBuzz
mailing list