[HarfBuzz] Don't render control characters?

James Clark jjc at jclark.com
Wed Mar 19 18:19:53 PDT 2014


On Thu, Mar 20, 2014 at 6:04 AM, Behdad Esfahbod <behdad at behdad.org> wrote:

>
> Also, Unicode says GC=Cc should just render as boxed if not supported.


However, it also says that  characters with the White_Space property true
it should be rendered as space.  In addition to 0x9, 0xA and 0xD (which
both CSS and HTML treat as white space), these are 0xB (VT), 0xC (FF), and
0x85 (NEL).

The
> reason we want them removed here is really an artifact of the HTML spec.


The requirement of ignoring all GC=Cc characters seems to be an artifact of
the CSS3 Text WD (http://www.w3.org/TR/css-text-3/#white-space-processing),
which is not yet set in stone.  Note that it's different from CSS2.1 (
http://www.w3.org/TR/CSS2/text.html#ctrlchars) which says that they render
as usual.

The CSS3 text behaviour seems like a bad idea to me, because

a) it conflicts with Unicode, and
b) legacy Windows encodings use C1 code points (in the range 0x80 - 0x9F)
for real characters; if a page using eg Windows-1252 encoding is
mislabelled as ISO-8859-1 (which can definitely happen) then all the code
points in this range would be silently be ignored rather than showing up as
boxes.

WDYT?
>

I think the default should be to do what Unicode says.  Also ask the CSS3
text folks why they are proposing this handling of Cc.

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20140320/b9f57073/attachment.html>


More information about the HarfBuzz mailing list