[HarfBuzz] Don't render control characters?

Konstantin Ritt ritt.ks at gmail.com
Sat Mar 28 10:38:16 PDT 2015


This seems to be deferred for ever :)
With the latest HarfBuzz, I still have to fix-up glyph/metrics for the
White_Spaces of GC=Cc|Zl|Zp to avoid the "missing glyph" boxes on rendering.

>From http://www.unicode.org/faq/unsup_char.html#2 :
> Q: Which characters should be displayed as a visible but blank space?
> A: This is the easy one: all the characters that have the White_Space
property, also generically known as “whitespace characters”. This set
includes SPACE, of course, but also such characters as the tab control
character, NO-BREAK SPACE, LINE SEPARATOR, and so on. For the full list,
see the White_Space values in PropList.txt
<http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt>.

And from PropList.txt :
0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE


My proposition is the following:
- The glyph for White_Spaces should be replaced with the glyph for U+0020
(except for U+0020 itself).
  This is a good first approximation which guarantees we would never get a
box for White_Spaces.
- If there is no glyph for White_Space in the font (and we just replaced it
with the glyph for U+0020), simply dup the metrics for U+0020 as well;
otherwise believe the font provides a correct metrics.
  This doesn't care about ie. half-width spaces but also a good
approximation for the most-common case.
This only applicable when no HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES has
been set; otherwise do nothing.

Regards,
Konstantin


2014-09-25 20:30 GMT+04:00 Behdad Esfahbod <behdad at behdad.org>:

> Thanks James and Jonathan for taking care of this on the CSS side.
> Working-group resolved to change this to display Cc characters
> (other than HT, LF, CR):
>
>   http://log.csswg.org/irc.w3.org/css/2014-09-08/#e469835
>
> On 14-03-20 03:19 AM, James Clark wrote:
> > On Thu, Mar 20, 2014 at 6:04 AM, Behdad Esfahbod <behdad at behdad.org
> > <mailto:behdad at behdad.org>> wrote:
> >
> >
> >     Also, Unicode says GC=Cc should just render as boxed if not
> supported.
> >
> >
> > However, it also says that  characters with the White_Space property
> true it
> > should be rendered as space.  In addition to 0x9, 0xA and 0xD (which
> both CSS
> > and HTML treat as white space), these are 0xB (VT), 0xC (FF), and 0x85
> (NEL).
> >
> >     The
> >     reason we want them removed here is really an artifact of the HTML
> spec.
> >
> >
> > The requirement of ignoring all GC=Cc characters seems to be an artifact
> of
> > the CSS3 Text WD (
> http://www.w3.org/TR/css-text-3/#white-space-processing),
> > which is not yet set in stone.  Note that it's different from CSS2.1
> > (http://www.w3.org/TR/CSS2/text.html#ctrlchars) which says that they
> render as
> > usual.
> >
> > The CSS3 text behaviour seems like a bad idea to me, because
> >
> > a) it conflicts with Unicode, and
> > b) legacy Windows encodings use C1 code points (in the range 0x80 -
> 0x9F) for
> > real characters; if a page using eg Windows-1252 encoding is mislabelled
> as
> > ISO-8859-1 (which can definitely happen) then all the code points in this
> > range would be silently be ignored rather than showing up as boxes.
> >
> >     WDYT?
> >
> >
> > I think the default should be to do what Unicode says.  Also ask the
> CSS3 text
> > folks why they are proposing this handling of Cc.
> >
> > James
>
> --
> behdad
> http://behdad.org/
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20150328/ca980513/attachment.html>


More information about the HarfBuzz mailing list