[HarfBuzz] Don't render control characters?

Behdad Esfahbod behdad at behdad.org
Tue Apr 14 00:24:25 PDT 2015


Lets continue this particular discussion here:

https://github.com/behdad/harfbuzz/commit/81ef4f407d9c7bd98cf62cef951dc538b13442eb#commitcomment-9469767

I want to come to a conclusion on this one as well.

b

On 15-03-28 10:38 AM, Konstantin Ritt wrote:
> This seems to be deferred for ever :)
> With the latest HarfBuzz, I still have to fix-up glyph/metrics for the
> White_Spaces of GC=Cc|Zl|Zp to avoid the "missing glyph" boxes on rendering.
> 
> From http://www.unicode.org/faq/unsup_char.html#2 :
>> Q: Which characters should be displayed as a visible but blank space?
>> A: This is the easy one: all the characters that have the White_Space
> property, also generically known as “whitespace characters”. This set includes
> SPACE, of course, but also such characters as the tab control character,
> NO-BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the
> White_Space values in PropList.txt
> <http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt>.
> 
> And from PropList.txt :
> 0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
> 0020          ; White_Space # Zs       SPACE
> 0085          ; White_Space # Cc       <control-0085>
> 00A0          ; White_Space # Zs       NO-BREAK SPACE
> 1680          ; White_Space # Zs       OGHAM SPACE MARK
> 2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
> 2028          ; White_Space # Zl       LINE SEPARATOR
> 2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
> 202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
> 205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
> 3000          ; White_Space # Zs       IDEOGRAPHIC SPACE
> 
> 
> My proposition is the following:
> - The glyph for White_Spaces should be replaced with the glyph for U+0020
> (except for U+0020 itself).
>   This is a good first approximation which guarantees we would never get a box
> for White_Spaces.
> - If there is no glyph for White_Space in the font (and we just replaced it
> with the glyph for U+0020), simply dup the metrics for U+0020 as well;
> otherwise believe the font provides a correct metrics.
>   This doesn't care about ie. half-width spaces but also a good approximation
> for the most-common case.
> This only applicable when no HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES has
> been set; otherwise do nothing.
> 
> Regards,
> Konstantin
> 
> 
> 2014-09-25 20:30 GMT+04:00 Behdad Esfahbod <behdad at behdad.org
> <mailto:behdad at behdad.org>>:
> 
>     Thanks James and Jonathan for taking care of this on the CSS side.
>     Working-group resolved to change this to display Cc characters
>     (other than HT, LF, CR):
> 
>       http://log.csswg.org/irc.w3.org/css/2014-09-08/#e469835
> 
>     On 14-03-20 03:19 AM, James Clark wrote:
>     > On Thu, Mar 20, 2014 at 6:04 AM, Behdad Esfahbod <behdad at behdad.org <mailto:behdad at behdad.org>
>     > <mailto:behdad at behdad.org <mailto:behdad at behdad.org>>> wrote:
>     >
>     >
>     >     Also, Unicode says GC=Cc should just render as boxed if not supported.
>     >
>     >
>     > However, it also says that  characters with the White_Space property true it
>     > should be rendered as space.  In addition to 0x9, 0xA and 0xD (which
>     both CSS
>     > and HTML treat as white space), these are 0xB (VT), 0xC (FF), and 0x85
>     (NEL).
>     >
>     >     The
>     >     reason we want them removed here is really an artifact of the HTML spec.
>     >
>     >
>     > The requirement of ignoring all GC=Cc characters seems to be an artifact of
>     > the CSS3 Text WD (http://www.w3.org/TR/css-text-3/#white-space-processing),
>     > which is not yet set in stone.  Note that it's different from CSS2.1
>     > (http://www.w3.org/TR/CSS2/text.html#ctrlchars) which says that they
>     render as
>     > usual.
>     >
>     > The CSS3 text behaviour seems like a bad idea to me, because
>     >
>     > a) it conflicts with Unicode, and
>     > b) legacy Windows encodings use C1 code points (in the range 0x80 -
>     0x9F) for
>     > real characters; if a page using eg Windows-1252 encoding is mislabelled as
>     > ISO-8859-1 (which can definitely happen) then all the code points in this
>     > range would be silently be ignored rather than showing up as boxes.
>     >
>     >     WDYT?
>     >
>     >
>     > I think the default should be to do what Unicode says.  Also ask the
>     CSS3 text
>     > folks why they are proposing this handling of Cc.
>     >
>     > James
> 
>     --
>     behdad
>     http://behdad.org/
>     _______________________________________________
>     HarfBuzz mailing list
>     HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>
>     http://lists.freedesktop.org/mailman/listinfo/harfbuzz
> 
> 

-- 
behdad
http://behdad.org/


More information about the HarfBuzz mailing list