[OpenFontLibrary] Fwd: Announcing new font compression project

Wed Mar 28 07:57:43 PDT 2012

:)

---------- Forwarded message ----------
From: Raph Levien <raph at google.com>
Date: 27 March 2012 15:08
Subject: Announcing new font compression project
To: www-font at w3.org

Greetings, web font enthusiasts.

The growth in adoption of web fonts over the past two years has been
stunning. One of the reasons holding people back from using web fonts
is concern over file size and the delay in text rendering until the
font is fully loaded. We believe that better compression of font files
will make web fonts even more appealing for designers, and make the
user experience better. We also believe that lossless compression is
quite practical, and is important because it will be completely
transparent to designers and users alike, with no degradation or
concerns over reliability and testing.

We have been researching a new lossless compression format, and am now
releasing it as open source and asking for a public discussion. The
code name for the project is "WOFF Ultra Condensed", and the hope is
for it to be considered by the W3C as a future evolution of the WOFF
standard. To give a flavor of the kind of improvements to expect,
running compression over the all fonts in the Google Web Fonts project
yields a mean of 26.9% gain compared to WOFF. Large CJK fonts benefit
particularly well - as one dramatic example, the Nanum Myeongjo font
is 48.5% smaller than the corresponding WOFF. More experiments will
follow.

The code and documentation of the draft wire format are here:

http://code.google.com/p/font-compression-reference/

http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensed.pdf
http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensedfileformat.pdf

The intent of this proposal is to preserve everything that has made
WOFF great and successful, just providing better compression. The
initial WOFF header, including the metadata features, is completely
unchanged from WOFF, with the exception of the signature.

There's more documentation inside the project, but here is a brief
overview of what's going on inside that makes these levels of
compression possible:

First, the entropy coding is LZMA, which offers significant gains
compared with zlib (gzip).

Second, there is preprocessing that removes much of the redundancy in
the TrueType format (which was designed for quick random access rather
than maximal packing into a stream). Third, the directory header is
packed using Huffman coding and a dictionary of common table values,
saving over 200 bytes (particularly important for small subsets).

There is also a provision for combining multiple tables into a single
entropy coding stream, which can save both the CPU time and file size
overhead of having many small streams.

We consider the format to be lossless, in the sense that the
_contents_ of the font file are preserved 100%. That said, the
decompressed font is not bit-identical to the source font, as there
are many irrelevant details such as padding and redundant ways of
encoding the same data (for example, it's perfectly valid, but
inefficient to repeat flag bytes in a simple glyph, instead of using
the repeat code). A significant amount of the compression is due to
stripping these out. One way of thinking about the losslessness
guarantee is that running a valid font through compression and
decompression should yield exactly the same TTX representation as the
original font. Further, we plan to build an extensive test suite to
validate this assertion.

In this proposal, we've tried to strike a balance between complexity
and aggressiveness of compression. The biggest gains by far come from
better compression of the glyf table (and eliminating the loca table
altogether), so basically this proposal squeezes this table to the
maximum. We estimate that somewhere between 0.5% and 1% each can be
gained by (1) eliminating lsb's from the hmtx table, and (2)
compressing the cmap using a technique similar to CFF. The source code
includes compression algorithms for both of these, but we can't be
100% sure about the gains because we haven't written the corresponding
decompression code. A big concern is overall spec complexity: We want
to make it practical for people to implement, test for conformance,
etc. We'd really love to hear people's thoughts on this, in
particular, whether it's worth going after every last bit of possible
compression.

This is an open source project, and we encourage participation from
the whole community. I'd also like to thank a number of people who
have contributed so far: the compression code is based on sfntly (by
the Google Internationalization team), the decompression code is built
on top of OTS (the OpenType Sanitizer), and a number of pleasant
discussions with Vlad Levantovsky, John Daggett have helped improved
it. Many of the ideas, and some particulars of the glyf table
compression, are based on Monotype Imaging's MicroType Express format,
which is now available under open-source and proprietary friendly
licensing terms,
seehttp://monotypeimaging.com/aboutus/mtx-license.aspx. Also thanks to
Kenichi Ishibashi for doing an integration into Chromium so we can
test it in real browsers (this will also be released soon).

We're looking forward to the discussion!

Raph Levien
Engineer, Google Web Fonts