[HarfBuzz] UCDN database size

Lóránt Pintér lorant.pinter at prezi.com
Fri Jan 25 07:18:14 PST 2013


Hi Grigori,  

Maybe we are talking about different things. If you look at this file, there's a lot of redundant data in there that could be eliminated with the simplest of compression techniques:

https://github.com/behdad/harfbuzz/blob/master/src/hb-ucdn/unicodedata_db.h#L1553

Can something be done about this?  

--  
Lóránt Pintér
Developer at Prezi (http://prezi.com)



On Thursday, January 24, 2013 at 1:21 AM, Grigori Goronzy wrote:

> On 01/22/2013 03:35 PM, Lóránt Pintér wrote:
> > I'm trying to reduce the size of the Emscripten-generated JavaScript in
> > HarfBuzz JS, and by far the biggest part is the UCDN database
> > arrays. IIRC, there was a plan to bring the database size down a notch.
> > Is there a way I can help with that?
> >  
>  
>  
> UCDN already goes great lengths to reduce the size of the database. The
> index arrays use three stages (as opposed to two stages in most other
> implementations), normalization data is efficiently coded and everything
> uses the smallest data type possible.
>  
> For the main properties, you could try splitting up the ucd_records
> table (vertically) into multiple tables so that there's more correlation
> between the individual fields. This should make the lookup index arrays
> smaller. Currently these arrays alone measure about 30 KB.
>  
> Decomposition/composition can't be improved much further, I guess. In
> these cases, the actual data takes up most of the space, and that's
> already encoded in UTF-16 so it's not going to get much better.
>  
> And then there's BiDi mirroring, but the table for that is already
> a mere 1.5 KB in size.
>  
> However, the real problem seems to be that emscripten does not have any
> method to store big chunks of binary data efficiently. I guess the
> database blows up to hundreds of kilobytes due to this. AFAIR it is
> possible to load binary data in JS. Maybe you should investigate this?
>  
> Best regards
> Grigori
>  
>  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20130125/d4a5b1e2/attachment.html>


More information about the HarfBuzz mailing list